openl3Preprocess

Preprocess audio for OpenL3 feature extraction

Since R2021a

Syntax

features = openl3Preprocess(audioIn,fs)

features = openl3Preprocess(audioIn,fs,Name,Value)

Description

features = openl3Preprocess(audioIn,fs) generates spectrograms from audioIn that can be fed to the OpenL3 pretrained network.

features = openl3Preprocess(audioIn,fs,Name,Value) specifies options using one or more Name,Value arguments. For example, features = openl3Preprocess(audioIn,fs,'OverlapPercentage',75) applies a 75% overlap between consecutive frames used to generate the spectrograms.

Examples

collapse all

Extract OpenL3 Embeddings from Audio Signal

Open Live Script

Use openl3Preprocess to extract embeddings from an audio signal.

Read in an audio signal.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

To extract spectrograms from the audio, call the openl3Preprocess function with the audio and sample rate. Use 50% overlap and set the spectrum type to linear. The openl3Preprocess function returns an array of 30 spectrograms produced using an FFT length of 512.

features = openl3Preprocess(audioIn,fs,OverlapPercentage=50,SpectrumType="linear");
[posFFTbinsOvLap50,numHopsOvLap50,~,numSpectOvLap50] = size(features)

posFFTbinsOvLap50 = 257

numHopsOvLap50 = 197

numSpectOvLap50 = 30

Call openl3Preprocess again, this time using the default overlap of 90%. The openl3Preprocess function now returns an array of 146 spectrograms.

features = openl3Preprocess(audioIn,fs,SpectrumType="linear");
[posFFTbinsOvLap90,numHopsOvLap90,~,numSpectOvLap90] = size(features)

posFFTbinsOvLap90 = 257

numHopsOvLap90 = 197

numSpectOvLap90 = 146

Visualize one of the spectrograms at random.

randSpect = randi(numSpectOvLap90);
viewRandSpect = features(:,:,:,randSpect);
N = size(viewRandSpect,2); 
binsToHz = (0:N-1)*fs/N;
nyquistBin = round(N/2);
semilogx(binsToHz(1:nyquistBin),mag2db(abs(viewRandSpect(1:nyquistBin))))
xlabel("Frequency (Hz)")
ylabel("Power (dB)");
title([num2str(randSpect),"th Spectrogram"])
axis tight
grid on

Figure contains an axes object. The axes object with title 19 th Spectrogram, xlabel Frequency (Hz), ylabel Power (dB) contains an object of type line.

Create an OpenL3 network using the same SpectrumType.

net = audioPretrainedNetwork("openl3",SpectrumType="linear");

Extract and visualize the audio embeddings.

embeddings = predict(net,features);
surf(embeddings,EdgeColor="none")
view([90,-90])
axis([1 numSpectOvLap90 1 numSpectOvLap90])
xlabel("Embedding Length")
ylabel("Spectrum Number")
title("OpenL3 Feature Embeddings")
axis tight

Figure contains an axes object. The axes object with title OpenL3 Feature Embeddings, xlabel Embedding Length, ylabel Spectrum Number contains an object of type surface.

Input Arguments

collapse all

`audioIn` — Input signal
column vector | matrix

Input signal, specified as a column vector or matrix. If you specify a matrix, openl3Preprocess treats the columns of the matrix as individual audio channels.

Data Types: single | double

`fs` — Sample rate (Hz)
positive scalar

Sample rate of the input signal in Hz, specified as a positive scalar.

Data Types: single | double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: openl3Preprocess(audioIn,fs,'SpectrumType','mel256')

`OverlapPercentage` — Percentage overlap between consecutive spectrograms
`90` (default) | scalar in the range [0,100)

Percentage overlap between consecutive spectrograms, specified as a scalar in the range [0,100).

Data Types: single | double

`SpectrumType` — Spectrum type
`'mel128'` (default) | `'mel256'` | `'linear'`

Spectrum type generated from audio and used as input to the neural network, specified as one of these:

'mel128' –– Generates mel spectrograms using 128 mel bands.
'mel256' –– Generates mel spectrograms using 256 mel bands.
'linear' –– Generates positive one-sided spectrograms using an FFT length of 512.

Data Types: char | string

Output Arguments

collapse all

`features` — Spectrograms that can be fed to OpenL3 pretrained network
N-by-M-by-1-by-K array

Spectrograms generated from audioIn, returned as an N-by-M-by-1-by-K array.

When you specify 'SpectrumType' as one of these:

'mel128' –– The dimensions are 128-by-199-by-1-by-K, where 128 is the number of mel bands and 199 is the number of time hops.
'mel256' –– The dimensions are 256-by-199-by-1-by-K, where 256 is the number of mel bands and 199 is the number of time hops.
'linear' –– The dimensions are 257-by-197-by-1-by-K, where 257 is the positive one-sided FFT length and 197 is the number of time hops.

K represents the number of spectrograms and depends on the length of audioIn, the number of channels in audioIn, as well as OverlapPercentage.

Data Types: single

References

[1] Cramer, Jason, et al. "Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings." In ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 3852-56. DOI.org (Crossref), doi:/10.1109/ICASSP.2019.8682475.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2021a

openl3Preprocess

Syntax

Description

Examples

Extract OpenL3 Embeddings from Audio Signal

Input Arguments

audioIn — Input signal column vector | matrix

fs — Sample rate (Hz) positive scalar

Name-Value Arguments

OverlapPercentage — Percentage overlap between consecutive spectrograms 90 (default) | scalar in the range [0,100)

SpectrumType — Spectrum type 'mel128' (default) | 'mel256' | 'linear'

Output Arguments

features — Spectrograms that can be fed to OpenL3 pretrained network N-by-M-by-1-by-K array

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`audioIn` — Input signal
column vector | matrix

`fs` — Sample rate (Hz)
positive scalar

`OverlapPercentage` — Percentage overlap between consecutive spectrograms
`90` (default) | scalar in the range [0,100)

`SpectrumType` — Spectrum type
`'mel128'` (default) | `'mel256'` | `'linear'`

`features` — Spectrograms that can be fed to OpenL3 pretrained network
N-by-M-by-1-by-K array

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.