Error Invalid Training Data- Predictors must be a N-by-1 cell array of sequences?
I am unable to solve the error Invalid training data. Predictors must be a N-by-1 cell array of sequences, where N is the number of sequences. All sequences must have the same
% Assume 'cnnDataArray' contains CNN-extracted features and 'allLabels' has the corresponding labels.
% Validate that the size of the data matches the labels
%cnnDataArray = cnnDataArray(1:numel(allLabels), :);
% Validate and align data and labels
numSamplesData = size(cnnDataArray, 1);
numSamplesLabels = numel(allLabels);
if numSamplesData > numSamplesLabels
fprintf('Truncating data to match labels.\n');
cnnDataArray = cnnDataArray(1:numSamplesLabels, :);
elseif numSamplesLabels > numSamplesData
fprintf('Truncating labels to match data.\n');
allLabels = allLabels(1:numSamplesData);
% Check consistency
if size(cnnDataArray, 1) ~= numel(allLabels)
error('Mismatch persists after alignment. Data samples: %d, Labels: %d.', size(cnnDataArray, 1), numel(allLabels));
disp('Data and labels are aligned.');
% Convert labels to categorical if not already
allLabels = allLabels(1:size(cnnDataArray, 1));
% Validate that lstmInput has the correct dimensions
[numSamples, ~, numFeatures] = size(lstmInput);
% Expanding trainDataCell
expandedTrainDataCell = cell(numel(trainDataCell), 1); % Create a new cell array to hold the expanded sequences
for i = 1:numel(trainDataCell)
% Ensure each sequence is correctly formatted
if size(trainDataCell{i}, 1) == 1
% If it's a single time step with F features, no need to reshape, keep it as [1, F]
expandedTrainDataCell{i} = trainDataCell{i}; % Keep as is
% If there are multiple time steps, keep the sequence structure intact
expandedTrainDataCell{i} = trainDataCell{i}; % Sequence remains as is
% Expanding testDataCell similarly
expandedTestDataCell = cell(numel(testDataCell), 1);
for i = 1:numel(testDataCell)
if size(testDataCell{i}, 1) == 1
% If it's a single time step with F features, keep it as is
expandedTestDataCell{i} = testDataCell{i};
% Otherwise, keep the sequence structure intact
expandedTestDataCell{i} = testDataCell{i};
% Check the size of the expanded cells
disp(size(expandedTrainDataCell)); % Should show [N, 1]
disp(size(expandedTestDataCell)); % Should show [M, 1]
% Now you can use these cell arrays directly for LSTM training
% Do not use cell2mat unless you need a matrix of fixed-size sequences
% Continue with training the LSTM using the expanded cell arrays
% Reshape data for LSTM
%lstmInput = reshape(cnnDataArray, [numSamples, 1, numFeatures]); % [numSamples, 1, numFeatures]
% Verify the reshaped data
%disp(size(lstmInput)); % Should display [numSamples, 1, numFeatures]
% Split data into training and testing sets (e.g., 80-20 split)
%cv = cvpartition(allLabels, 'Holdout', 0.2); % Adjust the holdout ratio if necessary
%trainIdx = training(cv);
Ayush Aniket
on 26 Dec 2024
Edited: Ayush Aniket
on 26 Dec 2024
Based on the error message, the error occurs due to discrepancy between the expectda data format and the format of your data.
The trainNetwork function expects data for 2-D image sequences (as it seems from the information provided) to be in the format of Nx1 cell array where each element is a h-by-w-by-c-by-s arrays, where h, w, and c correspond to the height, width, and number of channels of the images, respectively, and s is the sequence length.
Hence, you training data traindatacell (predictors) must be in the following format: Nx1 cell array where each element is a [1 numFeatures 1 s] array.
Refer the documentation link below to read about the expected format for different type of data:
Note: From the code provided, the imageInputLayer expects input in the format [1 numFeatures 1]. However as mentioned in its documentation, the expected format is a row vector of integers [h w c], where h, w, and c correspond to the height, width, and number of channels respectively. Assuming, numfeatures to be the number of channels, you should modify the format to the layer to [1 1 numFeatures]. Refer the documentation here:
If you are using any other data type, please share the format of the data that you are working with.
