Main Content

identifyLanguage

Identify languages in speech signals

Since R2024b

    Description

    language = identifyLanguage(audioIn,fs) returns the language identified in the given speech signal.

    This function requires Deep Learning Toolbox™.

    example

    language = identifyLanguage(audioIn,fs,LanguageIDFormat=format) specifies the format of the returned language identification.

    example

    [language,score] = identifyLanguage(___) also returns the confidence score associated with the language identification.

    example

    [language,score,results] = identifyLanguage(___) also returns a table containing all languages the pretrained network can identify, their respective scores, and the ISO language codes.

    example

    identifyLanguage(___) with no output arguments plots a bar graph of the top 5 highest-scoring languages.

    example

    Examples

    collapse all

    Try calling identifyLanguage in the command line. If the required model files are not installed, then the function throws an error and provides a link to download them. Click the link, and unzip the file to a location on the MATLAB path.

    Alternatively, execute the following commands to download and unzip the identifyLanguage model files to your temporary directory.

    downloadFolder = fullfile(tempdir,"identifyLanguageDownload");
    loc = websave(downloadFolder,"https://ssd.bat365/supportfiles/audio/lang-id-voxlingua107-ecapa-weights.zip");
    modelsLocation = tempdir;
    unzip(loc,modelsLocation)
    addpath(fullfile(modelsLocation,"lang-id-voxlingua107-ecapa-weights"))

    Read in an audio signal containing English speech and use identifyLanguage to identify the language spoken.

    [x,fs] = audioread("CleanSpeech-16-mono-3secs.ogg");
    lang = identifyLanguage(x,fs)
    lang = 
    "english"
    

    Read in another signal containing a phrase in Polish and identify the language.

    [x,fs] = audioread("polish.wav");
    lang = identifyLanguage(x,fs)
    lang = 
    "polish"
    

    Call identifyLanguage with no output arguments to plot the top 5 detected languages and their scores.

    identifyLanguage(x,fs)

    Figure contains an axes object. The axes object with title Languages Detected (Top 5), ylabel Network Score contains an object of type bar.

    Read in an audio signal containing English speech and use identifyLanguage with LanguageIDFormat set to "ISO-639" to get the ISO code of the identified language.

    [x,fs] = audioread("CleanSpeech-16-mono-3secs.ogg");
    lang = identifyLanguage(x,fs,LanguageIDFormat="ISO-639")
    lang = 
    "en"
    

    Read in an audio signal containing English speech and use identifyLanguage to identify the language spoken and get the confidence score of the identification. See the high confidence in this prediction.

    [x,fs] = audioread("CleanSpeech-16-mono-3secs.ogg");
    [lang,score] = identifyLanguage(x,fs)
    lang = 
    "english"
    
    score = single
    
    0.9998
    

    Read in another signal containing English. Use identifyLanguage to get the language identification, the confidence score, and a table with the results for all supported languages. See how the language is correctly identified but the confidence is lower, likely due to the limited vocabulary and sparsity of speech in the signal.

    [x,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
    [lang,score,results] = identifyLanguage(x,fs)
    lang = 
    "english"
    
    score = single
    
    0.3289
    
    results=107×3 table
        LanguageName    LanguageCode      Score  
        ____________    ____________    _________
    
        "english"           "en"          0.32889
        "albanian"          "sq"          0.14848
        "swedish"           "sv"          0.12257
        "latin"             "la"          0.11784
        "maltese"           "mt"          0.06679
        "arabic"            "ar"         0.047516
        "yiddish"           "yi"         0.047516
        "bosnian"           "bs"          0.02298
        "croatian"          "hr"         0.014139
        "slovenian"         "sl"         0.010902
        "welsh"             "cy"         0.010865
        "korean"            "ko"        0.0078524
        "hebrew"            "he"        0.0072787
        "afrikaans"         "af"        0.0066797
        "tagalog"           "tl"         0.005378
        "lao"               "lo"        0.0050288
          ⋮
    
    

    Input Arguments

    collapse all

    Speech signal, specified as a column vector. The minimum duration of the speech signal is 0.5 seconds.

    Data Types: single | double

    Sample rate in Hz, specified as a scalar. The identifyLanguage function requires a sample rate of at least 4000 Hz.

    Data Types: single | double

    Format of language identification returned by identifyLanguage, specified as "english-name" or "ISO-639".

    • "english-name"identifyLanguage returns the language as a string containing the common English-language name for the language, such as "spanish" or "japanese".

    • "ISO-639"identifyLanguage returns language as a string containing the two letter ISO 639-1 code for the language. If the language does not have an ISO 639-1 code, then the function returns the three letter ISO 639-2 code.

    Data Types: char | string

    Output Arguments

    collapse all

    Language identified in the speech signal, returned as a string. The format of the returned language identification is specified by format.

    Score for the identified language, returned as a single scalar. This score can be interpreted as confidence in the language identification.

    All language identification results from the speech input, returned as a table containing all scores and corresponding languages and language codes. The table contains the variables LanguageName, LanguageCode, and Score. The rows are sorted in descending order using the score variable.

    Algorithms

    The identifyLanguage function uses an ECAPA-TDNN[1] model to identify languages. This neural network uses pretrained weights from the lang-id-voxlingua107-ecapa model provided by SpeechBrain[2].

    References

    [1] Desplanques, Brecht, Jenthe Thienpondt, and Kris Demuynck. “ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification.” In Interspeech 2020, 3830–34. ISCA, 2020. https://doi.org/10.21437/Interspeech.2020-2650.

    [2] Ravanelli, Mirco, et al. SpeechBrain: A General-Purpose Speech Toolkit. arXiv, 8 June 2021. arXiv.org, http://arxiv.org/abs/2106.04624

    Extended Capabilities

    GPU Arrays
    Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

    Version History

    Introduced in R2024b