Word recognition power frequency domain

(Best to use this code in a livescript I think)
Hi, so my assignment is word recognition. As I understand we must take the sound signal to the power frequency domains and make use of their local maximums to identify the words through the mean square error method. So I've done this, I found the peaks x and y values and stored them in an array (Same with all other words I'm comparing the compare word to). I want to make use of the immse function to find the smallest error (Which will mean the word it's most likely to be) but the problem is, the saved matrixes have different lengths. So the amount of peaks in the power frequency domain is different for all my words. This means I can't apply this method to different words at all. I'm unaware of any other way to compute the meansquare error though or another way of computing the difference effectively.
The only important parts are actually at the end, where values are based on the final graph. (The circle values' x and y, which are on a 2xn matrix) is compared. Hz represents the matrix from the first word and Hz2 represents the data from the second word. Comp is the data from the word that needs to be compared to the other 2.
If anyone can hint to me an effective way or drop hints on how to use the power frequency domain to regognize what word has been said that would be much appreciated! Or if you're abe to help me be able to use the mean square method that would help too! The files will be attached. Thanks! (This code is a very simplified version of mine but consists of all the necessary parts to achieve what I'm trying).
My code is:
Fs=8000;
[CompareWord, Fs] = audioread("C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Sounds\CWord.wav");
Ts=1/Fs;
dt=(0:length(CompareWord)-1)*Ts;
nfft=length(CompareWord);
nfft2=2.^nextpow2(nfft);
ff=fft(CompareWord,nfft2);
ff=ff(1:nfft2/2);
ffm=movmax(ff,50);
xfft=Fs*(0:nfft2/2-1)/nfft2;
cut_off=1.2e3/Fs/2;
order=32;
h=fir1(order,cut_off);
fh=fft(h,nfft2);
fh=fh(1:nfft2/2);
mul=conv(fh,ff);
con=conv(CompareWord,h);
plot(dt,CompareWord);
plot(xfft,abs(ff/max(ff))); %#ok<ADPROPLC>
hold on;
%pks=findpeaks(abs(ffm));
%%Gets center x-coordinates of local maximum values.
TF2=islocalmax(abs(ffm),'FlatSelection',"center");
x=1:length(xfft);
hold on;
plot(x,abs(ff)/max(abs(ff)),x(TF2),abs(ff(TF2)/max(ff)),'r*');
hold off;
stem(h);
plot(abs(fh/max(fh))); %#ok<ADPROPLC>
sound(con);
plot(con);
plot(abs(mul));
TF3=islocalmax(abs(ffm),'FlatSelection',"center");
x=1:length(mul);
hold on;
hold off
m=length(con);
n=pow2(nextpow2(m));
y=fft(con,n);
f=(0:n-1)*(Fs/n)/10;
power=abs(movmax(y,10)).^2/n;
plot(1:n,power(1:n));
hold on;
%% Peak Values;
[TF4,x]=findpeaks(movmax(power,10),"MinPeakHeight",0.00005, "MinPeakDistance",75);
plot(x,TF4,'o');
horzcat(TF4,x);
err=100000;
for k=1:1
k=num2str(k);
mat=".mat";
%%File names to load
Word1="WordL1_"+k+mat;
Word2="WordR1_"+k+mat;
CompArea="CWord_"+mat;
%%Load .mat files with the 2xn matrices
load(Word1);
load(Word2);
load(CompArea);
if(immse(Comp,Hz)<err)
err=immse(Comp,Hz)
app.Flag=k;
end
if(immse(Comp,Hz2)<err)
err=immse(Comp,Hz2)
app.Flag=k;
end
end

 Accepted Answer

Hi Leon,
From what I can understand you are looking for the local maximums, however this returns vectors of varying lengths?
Have you considered constraining these vectors so you only take 'x' number of the most proment points, 'prominence' might also be a good indicator depending on what the signal looks like.
Kind regards,
Christopher

7 Comments

Hi Christopher, yes I have. I considered finding the 6 highest values using the findpeaks function but for some reason at the time I thought it wouldn't work out. I'll have a go at it in a while and get back to you. Thanks!
No luck, I used the findpeak "ascending" to get an array based on hight. I then used
[TF4,x]=findpeaks(power,"MinPeakHeight",0.00005,"MinPeakDistance",75,"SortStr","ascend");
indexes=6:length(TF4);
transpose(TF4);
transpose(x);
TF4(indexes)=[];
x(indexes)=[];
To get rid of the last 6 of the arrays in ascending order. I then used horzcat(TF4,x) and saved it as .mat file like I do all my words. I do get an error, however it isn't the word I want. I get the word as "one" when I say "5". This result is based on the lowest values of the mean square error that I get. Any other hints?
Using MinPeakProminance also didn't work and gives different size vectors
Hi Leon,
From what I can tell this now sounds like it is working but the predicted values are not always correct. I would consider adding more features to improve detection. This will start to take you more down the path of machine learning. Consider having a go at the MATLAB onramp to machine learning to get a better idea of how to preform feature extraction.
It should also be said that it will be unlikely for any algoritum to assess real world data correctly 100% of the time. In addition if you are using frequency based power calculations the accent or sex of the person speaking will have an effect on where the power lies in terms of frequency.
Let me know how you are getting on now!
Christopher
Leon Ellis
Leon Ellis on 14 Nov 2021
Edited: Leon Ellis on 14 Nov 2021
Well, it's been 3 days and every algorithm I've tried, from extracting peak frequencies, getting the frequency, time and power of the signal and comparing them, getting the estimated areas of the audio files and comparing them (In time and frequency domains) etc. I've made no progress. I'm able to apply the MSE correctly on all the data but no matter what data I extract the comparison never gives the results I'm looking for. One person said to use pspectrum function to obtain the time and frequency of the signal. I used that, still bad results. I don't have much time for a onramp course as the project is due soon. But thanks for the advice and help!
Hi Leon,
Persoanlly I would look towards wavelet transforms to get a better idea of the time-frequency domain.
While I don't like giving out HW answers I still feel there is a lot of learning and coding to do with this if you want to give it a try.
Let me know if this helps you,
Christopher
Thanks, I'll have a look at it and try get things working

Sign in to comment.

More Answers (0)

Products

Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!