openl3Embeddings

Extract OpenL3 feature embeddings

Since R2022a

Syntax

embeddings = openl3Embeddings(audioIn,fs)

embeddings = openl3Embeddings(audioIn,fs,Name=Value)

Description

embeddings = openl3Embeddings(audioIn,fs) returns OpenL3 feature embeddings over time for audio input audioIn with sample rate fs. Columns of the input are treated as individual channels.

example

embeddings = openl3Embeddings(audioIn,fs,Name=Value) specifies options using one or more name-value arguments. For example, embeddings = openl3Embeddings(audioIn,fs,OverlapPercentage=75) applies a 75% overlap between consecutive frames used to create the audio embeddings.

This function requires both Audio Toolbox™ and Deep Learning Toolbox™.

example

Examples

collapse all

Download `openl3Embeddings` Functionality

Open Live Script

Download and unzip the Audio Toolbox™ model for OpenL3.

Type openl3Embeddings at the command line. If the Audio Toolbox model for OpenL3 is not installed, the function provides a link to the location of the network weights. To download the model, click the link. Unzip the file to a location on the MATLAB® path.

Alternatively, execute the following commands to download and unzip the OpenL3 model to your temporary directory.

downloadFolder = fullfile(tempdir,"OpenL3Download");
loc = websave(downloadFolder,"https://ssd.mathworks.com/supportfiles/audio/openl3.zip");
OpenL3Location = tempdir;
unzip(loc,OpenL3Location)
addpath(fullfile(OpenL3Location,"openl3"))

Extract OpenL3 Embeddings

Open Live Script

Read in an audio file.

[audioIn,fs] = audioread('MainStreetOne-16-16-mono-12secs.wav');

Call the openl3Embeddings function with the audio and sample rate to extract OpenL3 feature embeddings from the audio. Using the openl3Embeddings function requires installing the pretrained OpenL3 network. If the network is not installed, the function provides a link to download the pretrained model.

embeddings = openl3Embeddings(audioIn,fs);

The openl3Embeddings function returns a matrix of 512-element feature vectors over time.

[numHops,numElementsPerHop,numChannels] = size(embeddings)

numHops = 
111

numElementsPerHop = 
512

numChannels = 
1

Decrease Time Resolution of OpenL3 Embeddings

Open Live Script

Create a 10-second pink noise signal and then extract OpenL3 embeddings. The openl3Embeddings function extracts feature embeddings from mel spectrograms with 90% overlap. Using the openl3Embeddings function requires installing the pretrained OpenL3 network. If the network is not installed, the function provides a link to download the pretrained model.

fs = 16e3;
dur = 10;
audioIn = pinknoise(dur*fs,1,"single");
embeddings = openl3Embeddings(audioIn,fs);

Plot the OpenL3 feature embeddings over time.

surf(embeddings,EdgeColor="none")
view([30 65])
axis tight
xlabel("Feature Index")
ylabel("Frame")
xlabel("Feature Value")
title("OpenL3 Feature Embeddings")

To decrease the resolution of OpenL3 feature embeddings over time, specify the percent overlap between mel spectrograms. Plot the results.

overlapPercentage = 10;
embeddings = openl3Embeddings(audioIn,fs,OverlapPercentage=overlapPercentage);
surf(embeddings,EdgeColor="none")
view([30 65])
axis tight
xlabel("Feature Index")
ylabel("Frame")
zlabel("Feature Value")
title("OpenL3 Feature Embeddings")

Input Arguments

collapse all

`audioIn` — Input signal
column vector | matrix

Input signal, specified as a column vector or matrix. If you specify a matrix, openl3Embeddings treats the columns of the matrix as individual audio channels.

Data Types: single | double

`fs` — Sample rate (Hz)
positive scalar

Sample rate of the input signal in Hz, specified as a positive scalar.

Data Types: single | double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: openl3Embeddings(audioIn,fs,SpectrumType="mel256")

`OverlapPercentage` — Percentage overlap between consecutive spectrograms
`90` (default) | scalar in the range [0,100)

Percentage overlap between consecutive spectrograms, specified as a scalar in the range [0,100).

Data Types: single | double

`SpectrumType` — Spectrum type
`"mel128"` (default) | `"mel256"` | `"linear"`

Spectrum type generated from audio and used as input to the neural network, specified as "mel128", "mel256", or "linear".

Note

The SpectrumType that you select controls the spectrogram used in the network. See audioPretrainedNetwork or openl3Preprocess for more details.

Data Types: char | string

`EmbeddingLength` — Embedding length
`512` (default) | `6144`

Length of the output audio embedding, specified as 512 or 6144.

Data Types: single | double

`ContentType` — Audio content type
`"env"` (default) | `"music"`

Audio content type the neural network is trained on, specified as "env" or "music".

Set ContentType to:

"env" when you want to use a model trained on environmental data.
"music" when you want to use a model trained on musical data.

Data Types: char | string

Output Arguments

collapse all

`embeddings` — Compact representation of audio data
N-by-L-by-C array

Compact representation of audio data, returned as an N-by-L-by-C array, where:

N –– Represents the number of buffered frames the audio signal is partitioned into and depends on the length of audioIn and the OverlapPercentage.
L –– Represents the audio embedding length.
C –– Represents the number of input channels.

Data Types: single

References

[1] Cramer, Jason, et al. "Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings." In ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 3852-56. DOI.org (Crossref), doi:/10.1109/ICASSP.2019.8682475.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2022a

openl3Embeddings

Syntax

Description

Examples

Download openl3Embeddings Functionality

Extract OpenL3 Embeddings

Decrease Time Resolution of OpenL3 Embeddings

Input Arguments

audioIn — Input signal column vector | matrix

fs — Sample rate (Hz) positive scalar

Name-Value Arguments

OverlapPercentage — Percentage overlap between consecutive spectrograms 90 (default) | scalar in the range [0,100)

SpectrumType — Spectrum type "mel128" (default) | "mel256" | "linear"

EmbeddingLength — Embedding length 512 (default) | 6144

ContentType — Audio content type "env" (default) | "music"

Output Arguments

embeddings — Compact representation of audio data N-by-L-by-C array

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Download `openl3Embeddings` Functionality

`audioIn` — Input signal
column vector | matrix

`fs` — Sample rate (Hz)
positive scalar

`OverlapPercentage` — Percentage overlap between consecutive spectrograms
`90` (default) | scalar in the range [0,100)

`SpectrumType` — Spectrum type
`"mel128"` (default) | `"mel256"` | `"linear"`

`EmbeddingLength` — Embedding length
`512` (default) | `6144`

`ContentType` — Audio content type
`"env"` (default) | `"music"`

`embeddings` — Compact representation of audio data
N-by-L-by-C array

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.