Clear Filters
Clear Filters

Why do I obtain a certain dimension in audio features vectors?

1 view (last 30 days)
I am doing a features extraction from an audio file of 30s, windowed with 5s windows using the following code:
%% Features extraction windowing the signal
duration_audio = N/fs; % Durata totale del segnale in secondi
window_length = 5; % Lunghezza della finestra in secondi
number_windows = duration_audio / window_length;
% Inizializza una matrice per le caratteristiche normalizzate
features_matrix = [];
% Loop per estrarre e normalizzare le caratteristiche da ciascuna finestra temporale
for i = 1:number_windows
% Calcola l'indice di inizio e fine della finestra corrente
ind_start = (i - 1) * window_length * fs + 1;
ind_end = i * window_length * fs;
% Estrai la finestra corrente dal segnale
window_audio = x(ind_start:ind_end);
% Calcola le caratteristiche spettrali
spectral_rolloff = spectralRolloffPoint(window_audio, fs);
spectral_centroid = spectralCentroid(window_audio, fs);
spectral_kurtosis = spectralKurtosis(window_audio, fs);
spectral_entropy = spectralEntropy(window_audio, fs);
spectral_flatness = spectralFlatness(window_audio, fs);
spectral_crest = spectralCrest(window_audio, fs);
spectral_flux = spectralFlux(window_audio, fs);
mfccs = mfcc(window_audio, fs);
% Concatena le caratteristiche in una riga della matrice
features_line = [spectral_rolloff, spectral_centroid, spectral_kurtosis, spectral_entropy, spectral_flatness, spectral_crest, spectral_flux, mfccs];
% Aggiungi la riga normalizzata alla matrice delle caratteristiche del periodo di riferimento
features_matrix = [features_matrix; features_line];
end
Every spectral feature is a vector of dimensions 498x1, do any of you know why 498 rows?
Thank you!

Answers (1)

Alan
Alan on 9 Oct 2023
Edited: Alan on 16 Oct 2023
Hi Isabella,
From what I understand, you want to know why each spectral feature has a dimension of 498.
That is because the spectral features are calculated after computing the spectrogram of each audio slice (of 5 seconds as specified in the code). While calculating the spectrogram, each of these slices are further sliced up into “windows” to compute the Fourier Transform. By default, a Hamming Window of 1024 samples length with 50 samples overlapping is used. The term “windows” in this context is different from the window_length used in the shared code.
The larger the window size and smaller the overlap length, the longer the spectrogram. A longer spectrogram leads to longer feature vectors. The “window_length” parameter from the code will also proportionally affect the length of the feature vector.
To play with these parameters and quickly view outputs, I would highly recommend using the “Extract Audio Features” task in a live script file: https://www.mathworks.com/help/audio/ref/extractaudiofeatures.html
In the task, expand “Specify Window Properties”, and modify the window size and overlap length, and you will observe that the number of rows for each feature will change!
I would also recommend directly passing the audio signal into the Extract Audio Featurestask instead of splitting the signal to chunks of 5 seconds and combining the feature matrices, as the task will automatically output a single matrix with combined features for the entire audio signal.
I hope this helped.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!