vadnet

(Not recommended) Voice activity detection (VAD) neural network

Since R2023a

collapse all in page

vadnet is not recommended. Use the audioPretrainedNetwork function instead.

Syntax

net = vadnet()

Description

net = vadnet() returns a pretrained VAD model.

This function requires both Audio Toolbox™ and Deep Learning Toolbox™.

example

Examples

collapse all

Detect Speech with Pretrained VAD Model

This example uses:

Open Live Script

Read in an audio signal containing speech and music and listen to the sound.

[audioIn,fs] = audioread("MusicAndSpeech-16-mono-14secs.ogg");
sound(audioIn,fs)

Use vadnetPreprocess to preprocess the audio by computing a mel spectrogram.

features = vadnetPreprocess(audioIn,fs);

Call audioPretrainedNetwork to obtain a pretrained VAD neural network.

net = audioPretrainedNetwork("vadnet");

Pass the preprocessed audio through the network to obtain the probability of speech in each frame.

probs = predict(net,features);

Use vadnetPosprocess to postprocess the network output and determine the boundaries of the speech regions in the signal.

roi = vadnetPostprocess(audioIn,fs,probs)

roi = 2×2

           1       63120
       83600      150000

Plot the audio with the detected speech regions.

vadnetPostprocess(audioIn,fs,probs)

Figure contains an axes object. The axes object with title Detected Speech, xlabel Time (s), ylabel Amplitude contains 8 objects of type line, constantline, patch.

Use VAD Neural Network on Streaming Audio

This example uses:

Open Live Script

Create a dsp.AudioFileReader object to stream an audio file for processing. Set the SamplesPerFrame property to read 100 ms nonoverlapping chunks from the signal.

afr = dsp.AudioFileReader("MaleVolumeUp-16-mono-6secs.ogg");
analysisDuration = 0.1; % seconds
afr.SamplesPerFrame = floor(analysisDuration*afr.SampleRate);

The vadnet architecture does not retain state between calls, and it performs best when analyzing larger chunks of audio signals. When you use vadnet in a streaming scenario, specific application requirements of accuracy, computational efficiency, and latency dictate the analysis duration and whether to overlap analysis chunks.

Create a timescope object to plot the audio signal and the corresponding speech probabilities. Create an audioDeviceWriter to play the audio as you stream it.

scope = timescope(NumInputPorts=2, ...
    SampleRate=afr.SampleRate, ...
    TimeSpanSource="property",TimeSpan=5, ...
    YLimits=[-1.2,1.2], ...
    ShowLegend=true,ChannelNames=["Audio","Speech Probability"]);
adw = audioDeviceWriter(afr.SampleRate);

Call audioPretrainedNetwork to obtain a pretrained VAD neural network.

net = audioPretrainedNetwork("vadnet");

In a streaming loop:

Read in a 100 ms chunk from the audio file.
Preprocess the audio into a mel spectrogram using vadnetPreprocess.
Use the VAD network to predict the probability of speech in each frame of the spectrogram. Replicate the probabilities to correspond to each sample in the audio signal.
Plot the audio signal and the probabilities of speech.
Play the audio with the device writer.

hop = 0.01 * afr.SampleRate;
while ~isDone(afr)
    audioIn = afr();

    features = vadnetPreprocess(audioIn,afr.SampleRate);
    probs = predict(net,features);
    % Replicate probs to correspond to samples in audioIn
    probs = repelem(probs,hop)';
    probs = probs((hop/2)+1:end-hop/2);

    scope(audioIn,probs)
    adw(audioIn);
end

Output Arguments

collapse all

`net` — Pretrained VAD model
`DAGNetwork` object

Pretrained VAD neural network, returned as a DAGNetwork (Deep Learning Toolbox) object.

Algorithms

The neural network is a ported version of the vad-crdnn-libriparty pretrained model provided by SpeechBrain [1], which combines convolutional, recurrent, and fully connected layers.

References

[1] Ravanelli, Mirco, et al. SpeechBrain: A General-Purpose Speech Toolkit. arXiv, 8 June 2021. arXiv.org, http://arxiv.org/abs/2106.04624

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Variable-size input is not supported.
To create a SeriesNetwork or DAGNetwork object for code generation, see Load Pretrained Networks for Code Generation (MATLAB Coder).

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Usage notes and limitations:

Variable-size input is not supported.
To create a SeriesNetwork or DAGNetwork object for code generation, see Load Pretrained Networks for Code Generation (GPU Coder).

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2023a

vadnet

Syntax

Description

Examples

Detect Speech with Pretrained VAD Model

Use VAD Neural Network on Streaming Audio

Output Arguments

`net` — Pretrained VAD model
`DAGNetwork` object

Algorithms

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Functions

Objects

Blocks

Topics

vadnet

Syntax

Description

Examples

Detect Speech with Pretrained VAD Model

Use VAD Neural Network on Streaming Audio

Output Arguments

net — Pretrained VAD model DAGNetwork object

Algorithms

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Functions

Objects

Blocks

Topics

`net` — Pretrained VAD model
`DAGNetwork` object

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.