Main Content

enhanceSpeech

Enhance speech signal

Since R2024a

    Description

    audioOut = enhanceSpeech(audioIn,fs) enhances the speech in the audio signal by reducing non-speech sounds.

    example

    enhanceSpeech(audioIn,fs) with no output arguments displays a plot of the original and enhanced speech.

    This function requires both Audio Toolbox™ and Deep Learning Toolbox™.

    example

    Examples

    collapse all

    Try calling enhanceSpeech in the command line. If the required model files are not installed, then the function throws an error and provides a link to download them. Click the link, and unzip the file to a location on the MATLAB path.

    Alternatively, execute the following commands to download and unzip the enhanceSpeech model files to your temporary directory.

    downloadFolder = fullfile(tempdir,"enhanceSpeechDownload");
    loc = websave(downloadFolder,"https://ssd.bat365/supportfiles/audio/enhanceSpeech.zip");
    modelsLocation = tempdir;
    unzip(loc,modelsLocation)
    addpath(fullfile(modelsLocation,"enhanceSpeech"))

    Read in an audio file containing speech and noise. Listen to the signal.

    [noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
    sound(noisySpeech,fs)

    Use enhanceSpeech to reduce the non-speech sounds in the signal. Listen to the enhanced signal.

    enhancedSpeech = enhanceSpeech(noisySpeech,fs);
    sound(enhancedSpeech,fs)

    Call enhanceSpeech with no output arguments to plot both the noisy signal and the enhanced signal.

    enhanceSpeech(noisySpeech,fs);

    Figure contains 4 axes objects. Axes object 1 with title Time Analysis, ylabel Audio Input contains an object of type line. Axes object 2 with title Spectral Analysis, ylabel Frequency (kHz) contains an object of type image. Axes object 3 with xlabel Time (s), ylabel Audio Output contains an object of type line. Axes object 4 with xlabel Time (s), ylabel Frequency (kHz) contains an object of type image.

    Read in an audio file containing speech and noise. Also read in an audio file containing the original clean speech to use as a reference signal.

    [noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
    reference = audioread("CleanSpeech-16-mono-3secs.ogg");

    Calculate the STOI metric for the noisy speech signal using stoi.

    noisySpeechSTOI = stoi(noisySpeech,reference,fs)
    noisySpeechSTOI = 0.8370
    

    Use enhanceSpeech to enhance the speech signal. Evaluate the enhanced signal using the STOI metric and see the improvement compared to the STOI of the noisy signal.

    enhancedSpeech = enhanceSpeech(noisySpeech,fs);
    enhancedSpeechSTOI = stoi(enhancedSpeech,reference,fs)
    enhancedSpeechSTOI = single
        0.8808
    

    Read in an audio file containing speech and noise. Also read in an audio file containing the original clean speech to use as a reference signal.

    [noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
    reference = audioread("CleanSpeech-16-mono-3secs.ogg");

    Calculate the ViSQOL metric for the noisy speech signal using visqol.

    noisySpeechMOS = visqol(noisySpeech,reference,fs,Mode="speech")
    noisySpeechMOS = 2.9550
    

    Use enhanceSpeech to enhance the speech signal. Evaluate the enhanced signal using the ViSQOL metric and see the improvement compared to the noisy signal.

    enhancedSpeech = enhanceSpeech(noisySpeech,fs);
    enhancedSpeechMOS = visqol(enhancedSpeech,reference,fs,Mode="speech")
    enhancedSpeechMOS = single
        3.2205
    

    Input Arguments

    collapse all

    Audio input containing the speech signal to enhance, specified as a column vector (single channel).

    Data Types: single | double

    Sample rate in Hz, specified as a positive scalar. The enhanceSpeech function requires a sample rate of at least 4000 Hz.

    Data Types: single | double

    Output Arguments

    collapse all

    Audio output containing the enhanced speech signal, returned as a column vector with the same size and sample rate as the input signal.

    Data Types: single

    Algorithms

    The enhanceSpeech function uses a pretrained MetricGAN-OKD [1] neural network to enhance speech signals.

    References

    [1] Shin, Wooseok, Byung Hoon Lee, Jin Sob Kim, Hyun Joon Park, and Sung Won Han. "MetricGAN-OKD: multi-metric optimization of MetricGAN via online knowledge distillation for speech enhancement." In International Conference on Machine Learning, pp. 31521-31538. PMLR, 2023.

    Extended Capabilities

    C/C++ Code Generation
    Generate C and C++ code using MATLAB® Coder™.

    GPU Arrays
    Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

    Version History

    Introduced in R2024a