cepstralFeatureExtractor

(Removed) Extract cepstral features from audio segment

The cepstralFeatureExtractor System object™ has been removed. For more information, see Version History.

Description

The cepstralFeatureExtractor System object extracts cepstral features from an audio segment. Cepstral features are commonly used to characterize speech and music signals.

To extract cepstral features:

Create the cepstralFeatureExtractor object and set its properties.
Call the object with arguments, as if it were a function.

To learn more about how System objects work, see What Are System Objects?

Creation

Syntax

cepFeatures = cepstralFeatureExtractor

cepFeatures = cepstralFeatureExtractor(Name,Value)

Description

example

cepFeatures = cepstralFeatureExtractor creates a System object, cepFeatures, that calculates cepstral features independently across each input channel. Columns of the input are treated as individual channels.

cepFeatures = cepstralFeatureExtractor(Name,Value) sets each property Name to the specified Value. Unspecified properties have default values.

Example: cepFeatures = cepstralFeatureExtractor('InputDomain','Frequency','SampleRate',fs,'LogEnergy','Replace') accepts a signal in the frequency domain, sampled at fs Hz. The first element of the coefficients vector is replaced by the log energy value.

Properties

expand all

Unless otherwise indicated, properties are nontunable, which means you cannot change their values after calling the object. Objects lock when you call them, and the release function unlocks them.

If a property is tunable, you can change its value at any time.

For more information on changing property values, see System Design in MATLAB Using System Objects.

`FilterBank` — Type of filter bank
`'Mel'` (default) | `'Gammatone'`

Type of filter bank, specified as either 'Mel' or 'Gammatone'. When FilterBank is set to Mel, the object computes the mel frequency cepstral coefficients (MFCC). When FilterBank is set to Gammatone, the object computes the gammatone cepstral coefficients (GTCC).

Data Types: char | string

`InputDomain` — Domain of input signal
`'Time'` (default) | `'Frequency'`

Domain of the input signal, specified as either 'Time' or 'Frequency'.

Data Types: char | string

`NumCoeffs` — Number of coefficients to return
`13` (default) | positive integer

Number of coefficients to return, specified as an integer in the range [2, v], where v is the number of valid passbands. The number of valid passbands depends on the type of filter bank:

Mel –– The number of valid passbands is defined as sum(BandEdges <= floor(SampleRate/2))-2.
Gammatone –– The number of valid passbands is defined as ceil(hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1))).

Data Types: single | double

`Rectification` — Nonlinear rectification type
`'Log'` (default) | `'Cubic-Root'`

Nonlinear rectification type, specified as 'Log' or 'Cubic-Root'.

Data Types: char | string

`FFTLength` — FFT length
`[]` (default) | positive integer

FFT length, specified as a positive integer. The default, [], means that the FFT length is equal to the number of rows in the input signal.

Dependencies

To enable this property, set InputDomain to 'Time'.

`LogEnergy` — Specify how the log energy is shown
`'Append'` (default) | `'Replace'` | `'Ignore'`

Specify how the log energy is shown in the coefficients vector output, specified as:

'Append' –– The object prepends the log energy to the coefficients vector. The length of the coefficients vector is 1 + NumCoeffs.
'Replace' –– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is NumCoeffs.
'Ignore' –– The object does not calculate or return the log energy.

Data Types: char | string

`SampleRate` — Input sample rate (Hz)
`16000` (default) | positive scalar

Input sample rate in Hz, specified as a real positive scalar.

Data Types: single | double

Advanced properties

`BandEdges` — Band edges of mel filter bank (Hz)
row vector

Band edges of the filter bank in Hz, specified as a nonnegative monotonically increasing row vector in the range [0, ∞). The maximum bandedge frequency can be any finite number. The number of bandedges must be in the range [4, 80].

The default band edges are spaced linearly for the first ten and then logarithmically after. The default band edges are set as recommended by [1].

Dependencies

To enable this property, set FilterBank to Mel.

Data Types: single | double

`FrequencyRange` — Frequency range of gammatone filter bank (Hz)
`[50 8000]` (default) | two-element row vector

Frequency range of the filter bank in Hz, specified as a positive, monotonically increasing two-element row vector. The maximum frequency can be any finite number. The center frequencies of the filter bank are equally spaced between hz2erb(FrequencyRange(1)) and hz2erb(FrequencyRange(2)) on the ERB scale.

Dependencies

To enable this property, set FilterBank to Gammatone.

Data Types: single | double

`FilterBankDesignDomain` — Domain for mel filter bank design
`'Hz'` (default) | `'Bin'`

Domain for filter bank design, specified as either 'Hz' or 'Bin'. The filter bank is designed as overlapped triangles with band edges specified by the BandEdges property.

The BandEdges property is specified in Hz. When you set the design domain to:

'Hz' –– Filter bank triangles are drawn in Hz and are mapped onto bins.

Here is an example that plots the filter bank in bins when the FilterBankDesignDomain is set to 'Hz':

[audioFile, fs] = audioread('NoisySpeech-16-22p5-mono-5secs.wav');
duration = round(0.02*fs); % 20 ms audio segment
audioSegment = audioFile(5500:5500+duration-1);
cepFeatures = cepstralFeatureExtractor('SampleRate',fs)

cepFeatures = 
 cepstralFeatureExtractor with properties:

   Properties
                InputDomain: 'Time'
                  NumCoeffs: 13
                  FFTLength: []
                  LogEnergy: 'Append'
                 SampleRate: 22500

   Advanced Properties
                  BandEdges: [1×42 double]
     FilterBankDesignDomain: 'Hz'
    FilterBankNormalization: 'Bandwidth'

Pass the audio segment as an input to the cepstral feature extractor algorithm to lock the object.

[coeffs,delta,deltaDelta] = cepFeatures(audioSegment);

Use the getFilters function to get the filter bank. Plot the filter bank.

[filterbank, freq] = getFilters(cepFeatures);
plot(freq(1:150),filterbank(1:150,:))

For details, see [1].

'Bin' –– The bandedge frequencies in 'Hz' are converted to bins. The filter bank triangles are drawn symmetrically in bins.
Change the FilterBankDesignDomain property to 'Bin':
```
release(cepFeatures);
cepFeatures.FilterBankDesignDomain = 'Bin';
[coeffs,delta,deltaDelta] = cepFeatures(audioSegment);
[filterbank, freq] = getFilters(cepFeatures);
plot(freq(1:150),filterbank(1:150,:))
```
For details, see [2].

Dependencies

To enable this property, set FilterBank to Mel.

Data Types: char | string

`FilterBankNormalization` — Normalize filter bank
`'Bandwidth'` (default) | `'Area'` | `'None'`

Normalization technique used on the weights of the filter bank, specified as:

'Bandwidth' –– The weights of each bandpass filter are normalized by the corresponding bandwidth of the filter.
'Area' –– The weights of each bandpass filter are normalized by the corresponding area of the bandpass filter.
'None' –– The weights of the filter are not normalized.

Data Types: char | string

Usage

Syntax

[coeffs,delta,deltaDelta]
= cepFeatures(audioIn)

Description

example

[coeffs,delta,deltaDelta] = cepFeatures(audioIn) returns the cepstral coefficients, the log energy, the delta, and the delta-delta.

The log energy value prepends the coefficient vector or replaces the first element of the coefficients vector based on whether you set the LogEnergy property to 'Append' or 'Replace'. For details, see coeffs.

Input Arguments

expand all

`audioIn` — Input signal
column vector | matrix

Input signal, specified as a column vector or matrix. If InputDomain is set to 'Time', specify audioIn as a real-valued frame of audio data. If InputDomain is set to 'Frequency', specify audioIn as a real- or complex-valued discrete Fourier transform. If specified as a matrix, the columns are treated as independent audio channels.

Data Types: single | double
Complex Number Support: Yes

Output Arguments

expand all

`coeffs` — Cepstral coefficients
column vector | matrix

Cepstral coefficients, returned as a column vector or a matrix. If the coefficients matrix is an N-by-M matrix, N is determined by the values you specify in NumCoeffs and LogEnergy properties. M equals the number of input audio channels.

When the LogEnergy property is set to:

'Append' –– The object prepends the log energy value to the coefficients vector. The length of the coefficients vector is 1 + NumCoeffs. This is the default setting of the LogEnergy property.
'Replace' –– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is NumCoeffs.
'Ignore' –– The object does not calculate or return the log energy.

Data Types: single | double

`delta` — Change in coefficients
column vector | matrix

Change in coefficients over consecutive calls to the algorithm, returned as a vector or a matrix. The delta array is of the same size and data type as the coeffs array.

In this example, cepFeatures is the cepstral feature extractor that accepts audio input signal sampled at 12 kHz. Stream in three segments of audio signal on three consecutive calls to the object algorithm. Return the cepstral coefficients of the filter bank and the corresponding delta values.

cepFeatures = cepstralFeatureExtractor('SampleRate',12000);
[coeff1,delta1] = cepFeatures(audioIn);
[coeff2,delta2] = cepFeatures(audioIn);
[coeff3,delta3] = cepFeatures(audioIn);

delta2 is computed as coeff2-coeff1, while delta3 is computed as coeff3-coeff2. The initial array, delta1, is an array of zeros.

Data Types: single | double

`deltaDelta` — Change in delta values
column vector | matrix

Change in delta values over consecutive calls to the algorithm, returned as a vector or a matrix. The deltaDelta array is the same size and data type as the coeffs and delta arrays.

In this example, consecutive calls to the cepstral feature extractor algorithm return the deltaDelta values in addition to the coefficients and the delta values.

cepFeatures = cepstralFeatureExtractor('SampleRate',12000);
[coeff1,delta1,deltaDelta1] = cepFeatures(audioIn);
[coeff2,delta2,deltaDelta2] = cepFeatures(audioIn);
[coeff3,delta3,deltaDelta3] = cepFeatures(audioIn);

deltaDelta2 is computed as delta2-delta1, while deltaDelta3 is computed as delta3-delta2. The initial array, deltaDelta1, is an array of zeros.

Data Types: single | double

Object Functions

To use an object function, specify the System object as the first input argument. For example, to release system resources of a System object named obj, use this syntax:

release(obj)

expand all

Specific to cepstralFeatureExtractor

getFilters Get auditory filter bank

Common to All System Objects

`clone`	Create duplicate System object
`isLocked`	Determine if System object is in use
`release`	Release resources and allow changes to System object property values and input characteristics
`reset`	Reset internal states of System object
`step`	Run System object algorithm

Examples

collapse all

Get MFCC Data for Speech Segment

Extract the mel frequency cepstral coefficients and the log energy values of segments in a speech file. Return delta, the difference between current and the previous cepstral coefficients, and deltaDelta, the difference between the current and the previous delta values. The log energy value the object computes can prepend the coefficients vector or replace the first element of the coefficients vector. This is done based on whether you set the LogEnergy property of the cepstralFeatureExtractor object to 'Replace' or 'Append'.

Read an audio signal from 'Counting-16-44p1-mono-15secs.wav' file. Extract a 40 ms segment from the audio data. Create a cepstralFeatureExtractor object. The cepstral coefficients computed by the default object are the mel frequency coefficients. In addition, the object computes the log energy, delta, and delta-delta values of the audio segment.

[audioFile, fs] = audioread('Counting-16-44p1-mono-15secs.wav');
duration = round(0.04*fs); % 40 ms
audioSegment = audioFile(40000:40000+duration-1);
cepFeatures = cepstralFeatureExtractor('SampleRate',fs)

cepFeatures = 
  cepstralFeatureExtractor with properties:

   Properties
       FilterBank: 'Mel'
      InputDomain: 'Time'
        NumCoeffs: 13
    Rectification: 'Log'
        FFTLength: []
        LogEnergy: 'Append'
       SampleRate: 44100

  Show all properties

The LogEnergy property is set to 'Append'. The first element in the coefficients vector is the log energy value and the remaining elements are the 13 cepstral coefficients computed by the object. The number of cepstral coefficients is determined by the value you specify in the NumCoeffs property.

[coeffs,delta,deltaDelta] = cepFeatures(audioSegment)

coeffs = 14×1

    5.2999
   -4.9406
    3.6130
    0.4397
   -0.2280
   -1.1068
    0.6679
    0.6367
   -0.3869
    0.6127
      ⋮

delta = 14×1

     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
      ⋮

deltaDelta = 14×1

     0
     0
     0
     0
     0
     0
     0
     0
     0
     0
      ⋮

The initial values for the delta and deltaDelta arrays are always zero. Consider another 40 ms audio segment in the file and extract the cepstral features from this segment.

audioSegmentTwo = audioFile(5820:5820+duration-1);
[coeffsTwo,deltaTwo,deltaDeltaTwo] = cepFeatures(audioSegmentTwo)

coeffsTwo = 14×1

   -0.1582
  -15.9507
    2.4295
    0.2835
    0.4345
    0.4382
    0.6040
    0.4168
    0.1846
    0.2636
      ⋮

deltaTwo = 14×1

   -5.4581
  -11.0101
   -1.1836
   -0.1561
    0.6625
    1.5449
   -0.0639
   -0.2199
    0.5715
   -0.3491
      ⋮

deltaDeltaTwo = 14×1

   -5.4581
  -11.0101
   -1.1836
   -0.1561
    0.6625
    1.5449
   -0.0639
   -0.2199
    0.5715
   -0.3491
      ⋮

Verify that the difference between coeffsTwo and coeffs vectors equals deltaTwo.

isequal(coeffsTwo-coeffs,deltaTwo)

ans = logical
   1

Verify that the difference between deltaTwo and delta vectors equals deltaDeltaTwo.

isequal(deltaTwo-delta,deltaDeltaTwo)

ans = logical
   1

Algorithms

expand all

Auditory Cepstrum Coefficients

Auditory cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.

The motivating idea of cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.

Two popular implementations of the filter bank are the mel filter bank and the gammatone filter bank.

Mel Filter Bank

The default mel filter bank linearly spaces the first 10 triangular filters and logarithmically spaces the remaining filters.

Gammatone Filter Bank

The default gammatone filter bank is composed of gammatone filters spaced linearly on the ERB scale between 50 and 8000 Hz. The filter bank is designed by gammatoneFilterBank.

Log Energy

If the input (x) is a time-domain signal, the log energy is computed using the following equation:

$\log E = \log (sum (x^{2}))$

If the input (x) is a frequency-domain signal, the log energy is computed using the following equation:

$\log E = \log (sum ({| x |}^{2}) / F F T L e n g t h)$

References

[1] Auditory Toolbox. https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf

[2] ETSI ES 201 108 V1.1.3 (2003-09). https://www.etsi.org/deliver/etsi_es/201100_201199/201108/01.01.03_60/es_201108v010103p.pdf

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

System Objects in MATLAB Code Generation (MATLAB Coder)

Version History

Introduced in R2018a

expand all

R2024a: Removed

The cepstraFeatureExtractor object has been removed. Use the mfcc and gtcc functions to compute the same features for batch signals. For streaming applications, improve performance by designing the filter bank once with designAuditoryFilterBank, and then apply the filter bank and extract the same features with cepstralCoefficients and audioDelta in the streaming loop. If you are extracting multiple audio features, use the audioFeatureExtractor object.

`cepstralFeatureExtractor` Configuration	Recommended Replacement
`FilterBank` property set to `"Mel"`	Use the `mfcc` function. [audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); [coeffs,delta,deltaDelta] = mfcc(audioIn,fs); Alternatively, use a combination of `designAuditoryFilterBank` and `cepstralCoefficients`. See Mel Frequency Cepstral Coefficients for an example. To extract delta features with this approach, use `audioDelta`.
`FilterBank` property set to `"Gammatone"`	Use the `gtcc` function. [audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); [coeffs,delta,deltaDelta] = gtcc(audioIn,fs); Alternatively, use a combination of `designAuditoryFilterBank` and `cepstralCoefficients`. See Gammatone Frequency Cepstral Coefficients for an example. To extract delta features with this approach, use `audioDelta`.
`FilterBankDesignDomain` property set to `"Bin"`	No replacement
`FilterBankNormalization` property set to `"Area"` or `"None"`	Use the `designAuditoryFilterBank` function, with the `Normalization` name-value argument set to `"area"` or `"none"`, combined with the `cepstralCoefficients` function.

R2022b: Warns

The cepstraFeatureExtractor object issues a warning that it will be removed in a future release.

R2020b: To be removed

The cepstraFeatureExtractor object runs without warning, but it will be removed in a future release.

cepstralFeatureExtractor

Description

Creation

Syntax

Description

Properties

FilterBank — Type of filter bank 'Mel' (default) | 'Gammatone'

InputDomain — Domain of input signal 'Time' (default) | 'Frequency'

NumCoeffs — Number of coefficients to return 13 (default) | positive integer

Rectification — Nonlinear rectification type 'Log' (default) | 'Cubic-Root'

FFTLength — FFT length [] (default) | positive integer

Dependencies

LogEnergy — Specify how the log energy is shown 'Append' (default) | 'Replace' | 'Ignore'

SampleRate — Input sample rate (Hz) 16000 (default) | positive scalar

BandEdges — Band edges of mel filter bank (Hz) row vector

Dependencies

FrequencyRange — Frequency range of gammatone filter bank (Hz) [50 8000] (default) | two-element row vector

Dependencies

FilterBankDesignDomain — Domain for mel filter bank design 'Hz' (default) | 'Bin'

Dependencies

FilterBankNormalization — Normalize filter bank 'Bandwidth' (default) | 'Area' | 'None'

Usage

Syntax

Description

Input Arguments

audioIn — Input signal column vector | matrix

Output Arguments

coeffs — Cepstral coefficients column vector | matrix

delta — Change in coefficients column vector | matrix

deltaDelta — Change in delta values column vector | matrix

Object Functions

Specific to cepstralFeatureExtractor

Common to All System Objects

Examples

Get MFCC Data for Speech Segment

Algorithms

Auditory Cepstrum Coefficients

Log Energy

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

Version History

R2024a: Removed

R2022b: Warns

R2020b: To be removed

See Also

`FilterBank` — Type of filter bank
`'Mel'` (default) | `'Gammatone'`

`InputDomain` — Domain of input signal
`'Time'` (default) | `'Frequency'`

`NumCoeffs` — Number of coefficients to return
`13` (default) | positive integer

`Rectification` — Nonlinear rectification type
`'Log'` (default) | `'Cubic-Root'`

`FFTLength` — FFT length
`[]` (default) | positive integer

`LogEnergy` — Specify how the log energy is shown
`'Append'` (default) | `'Replace'` | `'Ignore'`

`SampleRate` — Input sample rate (Hz)
`16000` (default) | positive scalar

`BandEdges` — Band edges of mel filter bank (Hz)
row vector

`FrequencyRange` — Frequency range of gammatone filter bank (Hz)
`[50 8000]` (default) | two-element row vector

`FilterBankDesignDomain` — Domain for mel filter bank design
`'Hz'` (default) | `'Bin'`

`FilterBankNormalization` — Normalize filter bank
`'Bandwidth'` (default) | `'Area'` | `'None'`

`audioIn` — Input signal
column vector | matrix

`coeffs` — Cepstral coefficients
column vector | matrix

`delta` — Change in coefficients
column vector | matrix

`deltaDelta` — Change in delta values
column vector | matrix

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.