Extract Features from Audio Data Sets
Feature extraction is an important part of machine learning and deep learning workflows for audio signals. For these workflows, you often need to train your model using features extracted from a large data set of audio files. Datastores are useful for working with large collections of data, and the audioDatastore
object allows you to manage collections of audio files.
This example shows different approaches to extracting features from an audio data set. It also shows how to use parallel computing to accelerate file reading and feature extraction. Parallel file reading and feature extraction requires Parallel Computing Toolbox™.
Create Datastore
Set the useFSDD
flag to true
to download Free Spoken Digit Dataset (FSDD) [1] containing recordings of spoken digits, and create an audioDatastore
object that points to the data. Otherwise, create a datastore with a small set of audio recordings of spoken digits.
Set the OutputDataType
property to "single"
to read the audio data into single-precision arrays. Deep learning workflows often require single-precision data, and using such data can help to speed up feature extraction. You can also set the OutputEnvironment
property to "gpu"
to return data on the GPU, which can also speed up feature extraction.
useFSDD = false; fs = 8000; % sample rate of audio data if useFSDD downloadFolder = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip"); dataFolder = tempdir; unzip(downloadFolder,dataFolder) dataset = fullfile(dataFolder,"FSDD","recordings"); ads = audioDatastore(dataset,OutputDataType="single"); else ads = audioDatastore(".",OutputDataType="single"); end
To read files and extract features in parallel in this example, call gcp
(Parallel Computing Toolbox) to get the current parallel pool of workers. If no parallel pool exists, gcp
starts a new pool.
pool = gcp;
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 4 workers.
Use audioFeatureExtractor
Object to Extract Features
The simplest way to extract audio features from a data set is to use the audioFeatureExtractor
object. Create an audioFeatureExtractor
to extract mel-frequency cepstral coefficients (MFCCs) from each audio file. Call extract
to extract the MFCCs from each audio file in the datastore. Calling extract
on the datastore requires that all the features from the datastore fit in memory.
afe = audioFeatureExtractor(SampleRate=fs,mfcc=true); mfccs = extract(afe,ads);
To read the files and extract features in parallel, call extract
with UseParallel
set to true
.
mfccs = extract(afe,ads,UseParallel=true);
Extract Features in Loop
Another approach to extract all the features is to loop through the files in the datastore. You might choose this method if you have a custom feature extraction algorithm that you cannot implement with audioFeatureExtractor
. In this example, you use stft
, which computes the short-time Fourier transform (STFT) of the audio signal.
Create a cell array to contain the features for each file. In a loop, read in the audio file from the datastore and extract the features using stft
. Store the features in the cell array. This approach requires the features from the whole data set to fit in memory.
numFiles = length(ads.Files); features = cell(1,numFiles); for index = 1:numFiles x = read(ads); features{index} = stft(x); end
Use Parallel Loop
You can partition the datastore and run the feature extraction loop on the partitions in parallel to improve performance.
Use numpartitions
to get a reasonable number of partitions given the number of files and number of workers in the current parallel pool.
numPartitions = numpartitions(ads,pool);
In a parfor
(Parallel Computing Toolbox) loop, partition the datastore and extract the features from the files in each partition.
reset(ads) partitionFeatures = cell(1,numPartitions); parfor ii = 1:numPartitions subds = partition(ads,numPartitions,ii); feats = cell(1,numel(subds.Files)); for index = 1:numel(subds.Files) x = read(subds); feats{index} = stft(x); end partitionFeatures{ii} = feats; end
Concatenate the features from each partition into one cell array.
features = cat(2,partitionFeatures{:});
Transform Datastore
Another approach for extracting features from the data set is to create a TransformedDatastore
that applies custom feature extraction when reading in a file. This method is useful when the features from the whole data set do not fit in memory.
Create a function featureExtraction
that takes the audio data from a file in the datastore and performs the feature extraction. Call transform
on the audioDatastore
with the featureExtraction
function handle to create a new datastore that performs the feature extraction.
function feats = featureExtraction(x) feats = stft(x); feats = {feats}; % return features in cell array because they have variable size end tds = transform(ads,@featureExtraction);
Calling read
on the new datastore reads in a file and performs feature extraction using the provided function.
fileFeatures = read(tds);
You can also use this method to read all the features from the data set into memory by calling readall
with the TransformedDatastore
.
features = readall(tds);
Read the files and extract features in parallel by calling readall
with UseParallel
set to true
.
features = readall(tds,UseParallel=true);
Next Steps
You can use the data set features to train a machine learning or deep learning model. Combine the features with label information to perform supervised learning. For example, the trainnet
(Deep Learning Toolbox) function allows you to train deep neural networks using labeled datastores.
References
[1] Zohar Jackson, César Souza, Jason Flaks, Yuxin Pan, Hereman Nicolas, and Adhish Thite. “Jakobovski/free-spoken-digit-dataset: V1.0.8”. Zenodo, August 9, 2018. https://doi.org/10.5281/zenodo.1342401.