audioFeatureExtractor

Streamline audio feature extraction

Description

`audioFeatureExtractor` encapsulates multiple audio feature extractors into a streamlined and modular implementation.

Creation

Syntax

``aFE = audioFeatureExtractor()``
``aFE = audioFeatureExtractor(Name=Value)``

Description

````aFE = audioFeatureExtractor()` creates an audio feature extractor with default property values.```

example

````aFE = audioFeatureExtractor(Name=Value)` specifies nondefault properties for `aFE` using one or more name-value arguments.```

Properties

expand all

Main Properties

Analysis window, specified as a real vector.

Data Types: `single` | `double`

Overlap length of adjacent analysis windows, specified as an integer in the range [0, `numel(Window)`).

Data Types: `single` | `double`

FFT length, specified as an integer. The default value of `[]` means that the FFT length is equal to the window length `numel(Window)`.

Data Types: `single` | `double`

Input sample rate in Hz, specified as a positive scalar.

Data Types: `single` | `double`

Input to spectral descriptors, specified as `"linearSpectrum"`, `"melSpectrum"`, `"barkSpectrum"`, or `"erbSpectrum"`.

Spectral descriptors affected by this property are:

The spectrum input to the spectral descriptors is the same as output from the corresponding feature:

For example, if you set `SpectralDescriptorInput` to `"barkSpectrum"`, and `spectralCentroid` to `true`, then `aFE` returns the centroid of the default Bark spectrum.

```[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); aFE = audioFeatureExtractor(SampleRate=fs, ... SpectralDescriptorInput="barkSpectrum", ... spectralCentroid=true); barkSpectralCentroid = extract(aFE,audioIn);```
If you specify a nondefault `barkSpectrum` using `setExtractorParameters`, then the nondefault Bark spectrum is the input to the spectral descriptors. For example, if you call `setExtractorParameters(aFE,"barkSpectrum",NumBands=40)`, then `aFE` returns the centroid of a 40-band Bark spectrum.

```setExtractorParameters(aFE,"barkSpectrum",NumBands=40) bark40SpectralCentroid = extract(aFE,audioIn);```

Data Types: `char` | `string`

Total number of features output from `extract` for the current object configuration, specified as a positive integer. `FeatureVectorLength` is equal to the second dimension of the output from the `extract` function.

Data Types: `single` | `double`

Features to Extract

Extract the one-sided linear spectrum, specified as `true` or `false`.

To set parameters of the linear spectrum extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"linearSpectrum",Name=Value)`
Settable parameters for the linear spectrum extraction are:

• `FrequencyRange` –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, `FrequencyRange` defaults to `[0, SampleRate/2]`.

• `SpectrumType` –– Spectrum type, specified as `"power"` or `"magnitude"`. If unspecified, `SpectrumType` defaults to `"power"`.

• `WindowNormalization` –– Apply window normalization, specified as `true` or `false`. If unspecified, `WindowNormalization` defaults to `true`.

Data Types: `logical`

Extract the one-sided mel spectrum, specified as `true` or `false`.

To set parameters of the mel spectrum extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"melSpectrum",Name=Value)`
Settable parameters for the mel spectrum extraction are:

• `FrequencyRange` –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, `FrequencyRange` defaults to `[0, SampleRate/2]`.

• `SpectrumType` –– Spectrum type, specified as `"power"` or `"magnitude"`. If unspecified, `SpectrumType` defaults to `"power"`.

• `NumBands` –– Number of mel bands, specified as an integer. If unspecified, `NumBands` defaults to `32`.

• `FilterBankNormalization` –– Normalization applied to bandpass filters, specified as `"bandwidth"`, `"area"`, or `"none"`. If unspecified, `FilterBankNormalization` defaults to `"bandwidth"`.

• `WindowNormalization` –– Apply window normalization, specified as `true` or `false`. If unspecified, `WindowNormalization` defaults to `true`.

• `FilterBankDesignDomain` –– Domain in which the filter bank is designed, specified as either `"linear"` or `"warped"`. If unspecified, `FilterBankDesignDomain` defaults to `"linear"`.

Data Types: `logical`

Extract the one-sided Bark spectrum, specified as `true` or `false`.

To set parameters of the Bark spectrum extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"barkSpectrum",Name=Value)`
Settable parameters for the Bark spectrum extraction are:

• `FrequencyRange` –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, `FrequencyRange` defaults to `[0, SampleRate/2]`.

• `SpectrumType` –– Spectrum type, specified as `"power"` or `"magnitude"`. If unspecified, `SpectrumType` defaults to `"power"`.

• `NumBands` –– Number of Bark bands, specified as an integer. If unspecified, `NumBands` defaults to `32`.

• `FilterBankNormalization` –– Normalization applied to bandpass filters, specified as `"bandwidth"`, `"area"`, or `"none"`. If unspecified, `FilterBankNormalization` defaults to `"bandwidth"`.

• `WindowNormalization` –– Apply window normalization, specified as `true` or `false`. If unspecified, `WindowNormalization` defaults to `true`.

• `FilterBankDesignDomain` –– Domain in which the filter bank is designed, specified as either `"linear"` or `"warped"`. If unspecified, `FilterBankDesignDomain` defaults to `"linear"`.

Data Types: `logical`

Extract the one-sided ERB spectrum, specified as `true` or `false`.

To set parameters of the ERB spectrum extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"erbSpectrum",Name=Value)`
Settable parameters for the ERB spectrum extraction are:

• `FrequencyRange` –– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified, `FrequencyRange` defaults to `[0, SampleRate/2]`.

• `SpectrumType` –– Spectrum type, specified as `"power"` or `"magnitude"`. If unspecified, `SpectrumType` defaults to `"power"`.

• `NumBands` –– Number of ERB bands, specified as an integer. If unspecified, `NumBands` defaults to `ceil(hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1)))`.

• `FilterBankNormalization` –– Normalization applied to bandpass filters, specified as `"bandwidth"`, `"area"`, or `"none"`. If unspecified, `FilterBankNormalization` defaults to `"bandwidth"`.

• `WindowNormalization` –– Apply window normalization, specified as `true` or `false`. If unspecified, `WindowNormalization` defaults to `true`.

Data Types: `logical`

Extract mel-frequency cepstral coefficients (MFCC), specified as `true` or `false`.

To set parameters of the MFCC extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"mfcc",Name=Value)`
Settable parameters for the MFCC extraction are:

• `NumCoeffs` –– Number of coefficients returned for each window, specified as a positive integer. If unspecified, `NumCoeffs` defaults to `13`.

• `DeltaWindowLength` –– Delta window length, specified as an odd integer greater than 2. If unspecified, `DeltaWindowLength` defaults to `9`. This parameter affects the `mfccDelta` and `mfccDeltaDelta` features.

• `Rectification` –– Type of nonlinear rectification, specified as `"log"` or `"cubic-root"`.

The mel-frequency cepstral coefficients are calculated using the melSpectrum.

Data Types: `logical`

Extract delta of MFCC, specified as `true` or `false`.

The delta MFCC is calculated based on the extracted MFCC. Parameters set on `mfcc` affect `mfccDelta`.

Data Types: `logical`

Extract delta-delta of MFCC, specified as `true` or `false`.

The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set on `mfcc` affect `mfccDeltaDelta`.

Data Types: `logical`

Extract gammatone cepstral coefficients (GTCC), specified as `true` or `false`.

To set parameters of the GTCC extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"gtcc",Name=Value)`
Settable parameters for the GTCC extraction are:

• `NumCoeffs` –– Number of coefficients returned for each window, specified as a positive integer. If unspecified, `NumCoeffs` defaults to `13`.

• `DeltaWindowLength` –– Delta window length, specified as an odd integer greater than 2. If unspecified, `DeltaWindowLength` defaults to `9`. This parameter affects the `gtccDelta` and `gtccDeltaDelta` features.

• `Rectification` –– Type of nonlinear rectification, specified as `"log"` or `"cubic-root"`.

The gammatone cepstral coefficients are calculated using the erbSpectrum.

Data Types: `logical`

Extract delta of GTCC, specified as `true` or `false`.

The delta GTCC is calculated based on the extracted GTCC. Parameters set on `gtcc` affect `gtccDelta`.

Data Types: `logical`

Extract delta-delta of GTCC, specified as `true` or `false`.

The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set on `gtcc` affect `gtccDeltaDelta`.

Data Types: `logical`

Extract spectral centroid, specified as `true` or `false`.

The spectral centroid is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral crest, specified as `true` or `false`.

The spectral crest is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral decrease, specified as `true` or `false`.

The spectral decrease is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral entropy, specified as `true` or `false`.

The spectral entropy is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral flatness, specified as `true` or `false`.

The spectral flatness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral flux, specified as `true` or `false`.

The spectral flux is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral flux extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"spectralFlux",Name=Value)`
Settable parameters for the spectral flux extraction are:

• `NormType` –– Norm type used to calculate the spectral flux, specified as `1` or `2`. If unspecified, `NormType` defaults to `2`.

Data Types: `logical`

Extract spectral kurtosis, specified as `true` or `false`.

The spectral kurtosis is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral rolloff point, specified as `true` or `false`.

The spectral rolloff point is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

To set parameters of the spectral rolloff point extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"spectralRolloffPoint",Name=Value)`
Settable parameters for the spectral flux extraction are:

• `Threshold` –– Threshold of the rolloff point, specified as a scalar in the range (0, 1). If unspecified, `Threshold` defaults to `0.95`.

Data Types: `logical`

Extract spectral skewness, specified as `true` or `false`.

The spectral skewness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral slope, specified as `true` or `false`.

The spectral slope is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract spectral spread, specified as `true` or `false`.

The spectral spread is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:

Data Types: `logical`

Extract pitch, specified as `true` or `false`.

To set parameters of the pitch extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"pitch",Name=Value)`
Settable parameters for the pitch extraction are:

• `Method` –– Method used to calculate the pitch, specified as `"PEF"`, `"NCF"`, `"CEP"`, `"LHS"`, or `"SRH"`. If unspecified, `Method` defaults to `"NCF"`. For a description of available pitch extraction methods, see `pitch`.

• `Range` –– Range within to search for the pitch in Hz, specified as a two-element row vector of increasing values. If unspecified, `Range` defaults to `[50,400]`.

• `MedianFilterLength` –– Median filter length used to smooth pitch estimates over time, specified as a positive integer. If unspecified, `MedianFilterLength` defaults to `1` (no median filtering).

Data Types: `logical`

Extract harmonic ratio, specified as `true` or `false`.

Data Types: `logical`

Extract zero-crossing rate, specified as `true` or `false`.

To set parameters of the zero-crossing rate extraction, use `setExtractorParameters`:

`setExtractorParameters(aFE,"zerocrossrate",Name=Value)`
Settable parameters for the zero-crossing rate extraction are:

• `Method` –– Method for computing the zero-crossing rate, specified as `"difference"` or `"comparison"`. If unspecified, `Method`, defaults to `"difference"`. For more information, see `zerocrossrate`.

• `Level` –– Signal level for which the crossing rate is computed, specified as a real scalar. `audioFeatureExtractor` subtracts the `Level` value from the signal and then finds the zero crossings. If unspecified, `Level` defaults to `0`.

• `Threshold` –– Threshold above and below the `Level` value over which the crossing rate is computed, specified as a real scalar. `audioFeatureExtractor` sets all the values of the input in the range ```[–Threshold, Threshold]``` to `0` and then finds the zero crossings. If unspecified, `Threshold` defaults to `0`.

• `TransitionEdge` — Transitions to include when counting zero crossings, specified as `"falling"`, `"rising"`, or `"both"`. If you specify `"falling"`, only negative-going transitions are counted. If you specify `"rising"`, only positive-going transitions are counted. If unspecified, `TransitionEdge` defaults to `"both"`.

• `ZeroPositive` — Sign convention, specified as a logical scalar. If you specify `ZeroPositive` as `true`, then `0` is considered positive. If you specify `ZeroPositive` as `false`, then `audioFeatureExtractor` considers `0`, `–1`, and `+1` to have distinct signs following the convention of the `sign` function. If unspecified, `ZeroPositive` defaults to `false`.

Data Types: `logical`

Extract short-time energy, specified as `true` or `false`. The short-time energy is computed using

`sTE = sum(xbw.^2,1)`,

where `xbw` is the buffered and windowed signal.

Example: Chirp Function

Generate a chirp sampled at 1 kHz for 3 seconds. The instantaneous frequency is 100 Hz at $\mathit{t}=0$ and crosses 200 Hz at $\mathit{t}=1$ second. Divide the signal into 103-sample segments with 43 samples of overlap between adjoining segments. Window each segment with a periodic Hamming window.

```fs = 1e3; x = chirp(0:1/fs:3,100,1,200)'; win = hamming(103,"periodic"); nover = 43; [xb,~] = buffer(x,length(win),nover,"nodelay"); xbw = xb.*win;```

Compute the short-time energy using the definition.

`Edef = sum(xbw.^2,1)';`

Use `audioFeatureExtractor` to compute the short-time energy.

```EaFE = extract(audioFeatureExtractor(shortTimeEnergy=true, ... SampleRate=fs,Window=win,OverlapLength=nover),x);```

Verify that both procedures give the same short-time energy.

`dff = max(abs(EaFE-Edef))`
```dff = 0 ```

Data Types: `logical`

Object Functions

 `extract` Extract audio features `setExtractorParameters` Set nondefault parameter values for individual feature extractors `info` Output mapping and individual feature extractor parameters `generateMATLABFunction` Create MATLAB function compatible with C/C++ code generation

Examples

collapse all

`[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");`

Create an `audioFeatureExtractor` object that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. Use a 30 ms analysis window with 20 ms overlap.

```aFE = audioFeatureExtractor( ... SampleRate=fs, ... Window=hamming(round(0.03*fs),"periodic"), ... OverlapLength=round(0.02*fs), ... mfcc=true, ... mfccDelta=true, ... mfccDeltaDelta=true, ... pitch=true, ... spectralCentroid=true, ... zerocrossrate=true, ... shortTimeEnergy=true);```

Call `extract` to extract the audio features from the audio signal.

`features = extract(aFE,audioIn);`

Use `info` to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.

`idx = info(aFE)`
```idx = struct with fields: mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13] mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26] mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39] spectralCentroid: 40 pitch: 41 zerocrossrate: 42 shortTimeEnergy: 43 ```

Plot the detected pitch over time.

```t = linspace(0,size(audioIn,1)/fs,size(features,1)); plot(t,features(:,idx.pitch)) title("Pitch") xlabel("Time (s)") ylabel("Frequency (Hz)")```

Plot the zero-crossing rate over time.

```plot(t,features(:,idx.zerocrossrate)) title("Zero-Crossing Rate") xlabel("Time (s)")```

Plot the short-time energy over time.

```plot(t,features(:,idx.shortTimeEnergy)) title("Short-Time Energy") xlabel("Time (s)")```

Create an audio datastore that points to audio samples included with Audio Toolbox®.

```folder = fullfile(matlabroot,"toolbox","audio","samples"); ads = audioDatastore(folder);```

Find all files that correspond to a sample rate of 44.1 kHz and then `subset` the datastore.

```keepFile = cellfun(@(x)contains(x,"44p1"),ads.Files); ads = subset(ads,keepFile);```

Convert the data to a `tall` array. `tall` arrays are evaluated only when you request them explicitly using `gather`. MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have Parallel Computing Toolbox™, you can spread the calculations across multiple workers. The audio data is represented as an M-by-1 tall cell array, where M is the number of files in the audio datastore.

`adsTall = tall(ads)`
```Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). adsTall = M×1 tall cell array { 539648×1 double} { 227497×1 double} { 8000×1 double} { 685056×1 double} { 882688×2 double} {1115760×2 double} { 505200×2 double} {3195904×2 double} : : : : ```

Create an `audioFeatureExtractor` object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.

```aFE = audioFeatureExtractor(SampleRate=44.1e3, ... melSpectrum=true, ... barkSpectrum=true, ... erbSpectrum=true, ... linearSpectrum=true);```

Define a `cellfun` function so that audio features are extracted from each cell of the tall array. Call `gather` to evaluate the tall array.

```specsTall = cellfun(@(x)extract(aFE,x),adsTall,UniformOutput=false); specs = gather(specsTall);```
```Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 14 sec Evaluation completed in 14 sec ```

The `specs` variable returned from gather is a numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.

`numFiles = numel(specs)`
```numFiles = 12 ```
`[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})`
```numHops1 = 1053 ```
```numFeaturesFile1 = 620 ```
```numChanelsFile1 = 1 ```
`[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})`
```numHops2 = 443 ```
```numFeaturesFile2 = 620 ```
```numChanelsFile2 = 1 ```

Algorithms

The `audioFeatureExtractor` creates a feature extraction pipeline based on your selected features. To reduce computations, `audioFeatureExtractor` reuses intermediary representations and outputs some intermediate representations as features.

For example, to create an object that extracts the centroid of the Bark spectrum, the flux of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify the `audioFeatureExtractor` as follows.

```aFE = audioFeatureExtractor( ... SpectralDescriptorInput="barkSpectrum", ... spectralCentroid=true, ... spectralFlux=true, ... pitch=true, ... harmonicRatio=true, ... mfccDeltaDelta=true)```
```aFE = audioFeatureExtractor with properties: Properties Window: [1024×1 double] OverlapLength: 512 SampleRate: 44100 FFTLength: [] SpectralDescriptorInput: 'barkSpectrum' Enabled Features mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio Disabled Features linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread To extract a feature, set the corresponding property to true. For example, obj.mfcc = true, adds mfcc to the list of enabled features.```
This configuration corresponds to the highlighted feature extraction pipeline.

Note

Because `audioFeatureExtractor` reuses intermediary representations, the features output from `audioFeatureExtractor` might not correspond with the default configuration of features output by corresponding individual feature extractors.

Version History

Introduced in R2019b

expand all

Behavior changed in R2020b