How do I plot the sorted vs unsorted sequences as seen in the documentation?

2 views (last 30 days)
Hi All,
I'm working with LSTM networks, and would like to illustrate the advantage of sorting the data with my own data sets as illustrated below, and mentioned here. However, although I know how to plot the sorted date, the blue part, I don't know how to add the padding part and the mini-batches and the splits, how do plot this?
Thanks in advance!

Answers (1)

Abhinav Aravindan
Abhinav Aravindan on 26 Feb 2025
To illustrate the advantage of sorting data with your own dataset, you may use the “barh” function to plot your sequences.
For instance, in the screenshot attached, the plot illustrates ordering data by sequence length and padding all the sequences in a mini-batch to the nearest multiple of a length that is greater than the longest sequence length in the mini-batch. You may refer to the below code snippet to generate a similar plot using “barh”.
% Load Data
[Xtrain,Ytrain] = japaneseVowelsTrainData;
seqLen = cellfun(@(X) size(X,2), Xtrain);
% Sorted Sequence
[sequenceLengthsOrd,idx] = sort(seqLen);
Xtrain = Xtrain(idx);
% Define parameters
batchSize = 27;
sequenceLength = 5;
numBatches = floor(length(Xtrain) / batchSize);
allOriginalLengths = [];
allPaddedLengths = [];
for i = 1:numBatches
% Calculate indices and get mini-batch data
startIdx = (i-1) * batchSize + 1;
endIdx = i * batchSize;
X_batch = Xtrain(startIdx:endIdx);
% Determine the lengths of sequences in the mini-batch
seqLen_Xbatch = cellfun(@(X) size(X,2), X_batch);
maxSeqLength = max(seqLen_Xbatch);
% Determine padding length
paddingLength = ceil(maxSeqLength / sequenceLength) * sequenceLength;
allOriginalLengths = [allOriginalLengths; seqLen_Xbatch];
allPaddedLengths = [allPaddedLengths; repmat(paddingLength, batchSize, 1)];
end
% Plot
figure;
barh(allOriginalLengths,'BarWidth', 1); % Original lengths
hold on;
barh(allPaddedLengths, 'FaceColor', [0.7, 0.7, 0.9], 'FaceAlpha', 0.5,'BarWidth', 1); % Padded lengths
% Horizontal Lines
yValues = [0, 27, 27*2, 27*3, 27*4, 27*5, 27*6, 27*7, 27*8, 27*9, 27*10];
xRanges = [15, 15, 15, 15, 15, 20, 20, 20, 25, 30, 30];
for i = 1:length(yValues)
plot([0, xRanges(i)], [yValues(i), yValues(i)], '-r', 'LineWidth', 1);
end
% Vertical Lines
lineData = {
[15, 15], [0, 27*5];
[20, 20], [27*5, 27*8];
[25, 25], [27*8, 27*9];
[30, 30], [27*9, 27*10];
};
for i = 1:size(lineData, 1)
plot(lineData{i, 1}, lineData{i, 2}, '-r', 'LineWidth', 1);
end
xline(5, '--r', 'LineWidth',1);
xline(10, '--r', 'LineWidth',1);
xline(15, '--r', 'LineWidth',1);
plot([20 20], [27*8 27*10], '--r', 'LineWidth', 1);
plot([250 25], [27*9 27*10], '--r', 'LineWidth', 1);
hold off;
title('Sorted Data');
xlabel('Length');
ylabel('Sequence');
xlim([0, max(allPaddedLengths)]);
Output:
Please refer to the below documentation for more details on the usage of "barh"
I hope this helps!

Categories

Find more on Shifting and Sorting Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!