How can I perform speaker verification for X-Vectors based on the ivectorsystem documentation?
6 views (last 30 days)
Show older comments
I am trying to create a basic voice based attendance system as a beginner project for biometric based security. I am using MathWorks' implementation of X-Vector systems for this project. Based on this link's implementation of X-Vector based speaker verification : https://www.mathworks.com/help/audio/ug/speaker-recognition-using-x-vectors.html, I have already trained the TDNN, X-Vector system and PLDA scoring. I have also obtained thresholds for the PLDA and Cosine Similarity scoring here based on the Detection Error Tradeoff figure using the X-axis values of the EER.
Since the above link states that I-Vector and X-Vector share the same classifier backend ("The x-vector system backend, or classifier, is the same as developed for i-vector systems. For details on the backend, see Speaker Verification Using i-vectors and ivectorSystem."), how would I adapt the ivectorsystem's verify() function in the speaker verification using I-Vectors example to use X-Vectors instead per this link : https://www.mathworks.com/help/audio/ref/ivectorsystem.html? Presumably, in the X-Vector speaker recognition link, all the helper functions were probably wrapper functions for X-Vector.
0 Comments
Accepted Answer
Brian Hemmat
on 6 May 2024
I don't think you can reuse the verify method for your purpose, but here's generally the steps you need to be taking:
To perform speaker verification, you need a ground truth speaker embedding. It can be an i-vector, an x-vector, etc. If you've already trained the x-vector model using the recipe in the example, you'll want to perform preprocessing and prediction using the same pipeline. Speaker Diarization Using x-vectors uses the x-vector model and walks through the preprocessing steps. Here is just a sketch of what it would look like:
x = knownspeechsignal;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingTemplate = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
When you have unkown speech, you perform the same steps.
x = unknownspeechsignal;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingUnknown = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
To perform speaker verification, you score the two features using either PLDA or CSS. Here's an example of CSS:
css = dot(embeddingTemplate,embeddingUnknown)/norm(embeddingTemplate)*norm(embeddingUnknown);
speakerisverified = css < threshold
You'll need to maintain a list of template embeddings to look up when attempting to perform speaker verification.
Here's a sketch of it all together.
% Create templates for known speakers
x = knownspeechsignal_1;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingTemplate_1 = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
x = knownspeechsignal_2;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingTemplate_2 = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
% Create an enrollment list
enrolledSpeakers = dictionary(["speaker 1","speaker 2"],[embeddingTemplate_1,embeddingTemplate_1]);
% Extract embedding from unknown speaker
x = unknownspeechsignal;
features = (extract(afe,x)-globalPrecomputedMean)./globalPrecomputedSTD;
embeddingUnknown = predict(model,dlarray(features,'TCB'),Outputs="fc_1");
% Unknown speaker purports to be speaker 1, verify that:
claimedidentity = "speaker 1";
embeddingTemplate = enrolledSpeakers("speaker 1");
css = dot(embeddingTemplate,embeddingUnknown)/norm(embeddingTemplate)*norm(embeddingUnknown);
speakerisverified = css < threshold
The PLDA model is not currently offered standalone, you can use the internal version that ivectorSystem has at your own risk (it is not intended to be user-facing and may change at any time). To see an example of using it, step through either the x-vector training example or diarization example. Alternatively, this example walks through the nitty-gritty of the entire i-vector system including the G-PLDA scoring: Speaker Verification Using i-vectors.
Also, depending on the difficulty of your speaker verification task, you might consider using the speakerRecognition function to return a pretrained i-vector system.
Please ask for any clarifying questions--I'm hoping to add some examples where the whole detection error tradeoff, identification, verification, are componentized.
2 Comments
Brian Hemmat
on 16 May 2024
Edited: Brian Hemmat
on 16 May 2024
I don't follow the first question--I would say try it and if it doesn't work, provide some code that lead to the error.
Regarding the second qeustion about general way to obtain a DET plot and calculate the FAR, FRR, and EER, that's also done explicitly here: Speaker Verification Using i-vectors. There are different ways to calculate the DET in terms of what data you use. Often there's explicit pairs you want to score against each other (at least--that's how competitions on the subject usually work).
I've found that just exhaustively pairing all embeddings gives about the same results. Below is a sketch of that.
Assume we have a matrix of embedding vectors output from your model.
embeddingLength = 200;
numEmbeddings = 20*30;
embeddings = rand(embeddingLength,numEmbeddings);
Each embedding vector has a corresponding label. So the labels elements correspond to the columns of embeddings.
labels = categorical(repelem(1:30,20));
Calculate scores for all pairs of embeddings--we'll throw away the repetitions later.
allscores = css(embeddings,embeddings);
Create a matrix that says whether the labels below to the same or different speakers.
uniqueLabels = unique(labels);
class_matrix = labels'==labels;
Isolate the scores that correspond to matched pairs and the scores that correspond to unmatched pairs.
n = size(scoresmat,1);
lower_triangular_logical = tril(ones(n, n), -1) == 1;
scoresmat(~lower_triangular_logical) = nan;
scoreLike = scoresmat(class_matrix);
scoreUnlike = scoresmat(~class_matrix);
scoreLike(isnan(scoreLike)) = [];
scoreUnlike(isnan(scoreUnlike)) = [];
Define a range of thresholds to test
numThresholdsInSweep = 1000;
Thresholds = linspace(min(scoreUnlike),max(scoreLike),numThresholdsInSweep);
Calculate the false reject rate for each threshold in the sweep.
FRR = mean(scoreLike(:)<Thresholds(:)',1);
Calculate the false acceptance rate for each threshold in the sweep.
FAR = mean(scoreUnlike(:)>=Thresholds(:)',1);
Get the threshold where the FRR and FAR intersect (a better version of this would interpolate the points before and after).
[~,EERThresholdIdx] = min(abs(FRR-FAR));
EERThreshold = Thresholds(EERThresholdIdx);
Calculate the EER.
EER = mean([FAR(EERThresholdIdx),FRR(EERThresholdIdx)]);
Plot the results.
figure
plot(Thresholds,FRR,"k"), hold on
plot(Thresholds,FAR,"b")
plot(EERThreshold,EER,"ro",MarkerFaceColor="r")
title(["Equal Error Rate = " + round(EER,4),"Threshold = " + round(EERThreshold,4)])
xlabel('Threshold')
ylabel('Error Rate')
legend('FAR','FRR','Equal Error Rate (EER)')
grid on
axis tight
hold off
Supporting Functions
function y = css(w1,wt)
% This calculates the css of all pairs in w1 and wt in a vectorized way.
% Add this to your path to use.
y = squeeze(sum(w1.*reshape(wt,size(wt,1),1,[]),1)./(vecnorm(w1).*reshape(vecnorm(wt),1,1,[])));
end
More Answers (0)
See Also
Categories
Find more on Pretrained Models in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!