Invalid training data. Responses must be nonempty.

14 views (last 30 days)
Hello,
I am trying to build simple network which will recognize gender from voice. I have many records. I read them in DataStore but I cant get them in sequenceInputLayer. I tried everything. I know that my Neural network will maybe not work because of layers, but I only want to strat it and than I will make it accurate. Every record is longer than 6000 samples.
I gives me this error:
Error using trainNetwork (line 183)
Invalid training data. Responses must be nonempty.
Error in Program2 (line 31)
net = trainNetwork(audioTrain,layers, options)
clc;
close all;
clear all;
net = network
audio = audioDatastore(fullfile('E:\Projekt\M or F'), ...
'IncludeSubfolders',true, ...
'FileExtension', '.wav', ...
'LabelSource','foldernames');
labelCount = countEachLabel(audio)
numTrainFiles = 1000;
[audioTrain,audioValidation] = splitEachLabel(audio,numTrainFiles,'randomize');
layers = [ ...
sequenceInputLayer(6000)
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
options = trainingOptions("adam", ...
"MaxEpochs",4, ...
"MiniBatchSize",256, ...
"Plots","training-progress", ...
"Verbose",false, ...
"Shuffle","every-epoch", ...
"LearnRateSchedule","piecewise", ...
"LearnRateDropFactor",0.1, ...
"LearnRateDropPeriod",1, ...
'ValidationFrequency',100);
net = trainNetwork(audioTrain,layers, options)

Accepted Answer

jibrahim
jibrahim on 1 Mar 2021
Hi Martin,
You can't pass an audioDatastore directly to the network. Create a transform datastore that organizes the data into (audio,label) pairs.
The code below is a simple example where we try to recognize a speaker using an idea similar to yours. The accuracy is not good, but hopefully it is a good starting point.
If you have not done so already, O also recommend looking into this gender ID example in Audio Toolbox:
You might have better luck extracting features from the audio, rather than passing the raw audio to a network.
In any case, here is some example code:
% Download the FSDD data set
url = 'https://ssd.mathworks.com/supportfiles/audio/FSDD.zip';
datasetFolder = tempdir;
unzip(url,datasetFolder)
% Create datastore
% Use speaker name in file name as label
ads = audioDatastore(fullfile(datasetFolder,'FSDD'), ...
'IncludeSubfolders',true);
[~,filenames] = fileparts(ads.Files);
ads.Labels = categorical(extractBetween(filenames,'_','_'));
[adsTrain,adsValidation] = splitEachLabel(ads,.9);
inputSize = 500;
numHiddenUnits = 100;
numClasses = length(unique(ads.Labels));
layers = [ ...
sequenceInputLayer(inputSize)
bilstmLayer(numHiddenUnits,"OutputMode","sequence")
bilstmLayer(numHiddenUnits,"OutputMode","last")
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
% Transformed datastores to be passed directly to network
tdsTrain = transform(@(x,info)processData(x,inputSize,info),adsTrain,'IncludeInfo',true);
tdsValidation = transform(@(x,info)processData(x,inputSize,info),adsValidation,'IncludeInfo',true);
options = trainingOptions("adam", ...
"MaxEpochs",4, ...
"MiniBatchSize",256, ...
"Plots","training-progress", ...
"Verbose",false, ...
"Shuffle","every-epoch", ...
"LearnRateSchedule","piecewise", ...
"LearnRateDropFactor",0.1, ...
"LearnRateDropPeriod",1, ...
"ValidationData",tdsValidation,...
'ValidationFrequency',100);
net = trainNetwork(tdsTrain,layers, options)
Here is the transform function I used:
function [data,info] = processData(audio,inputSize,info)
% Break audio into sequences to length inputSize with overlap
% inputSize/2
audio = buffer(audio,inputSize,floor(inputSize/2));
audio = mat2cell(audio,inputSize,ones(1,size(audio,2))).';
label = repmat(info.Label,size(audio,1),1);
data = table(audio,label);
end

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!