logical indexing vs cell indexing - performance

3 views (last 30 days)
MC
MC on 6 Jul 2021
Edited: dpb on 8 Jul 2021
I am importing [X,Y] data arrays from a number of *.csv files and combining them into a single *.mat file for plotting / analysis later.
The length of [X,Y] arrays is different in each file. The length of [X,Y] is equal in a single file. So I can't use a multidimensional numerical array.
Is it more efficient to store the data in a single numerical array with a few identification columns which I can index logically later?
Or, is it better to store the [X,Y] data from each file in a separate numerical array in separate cells and access it by normal indexing.
Perhaps my question is... is it better to sort/filter the data as I import it, or when I plot it?
I've included some (poor) sudo-code below to better explain the concepts. I hope this makes sense.
Thanks in advance, Mark.
% Logical indexing example (like a database table).
% concatenate data from different files into single numerical array
for file = 1:999
data = [data;...
[file, id1, id2, X, Y]...
];
end
% plot single X,Y pair of data with specified ids
function plotProfile(data,fileid,id1,id2)
logicalMask = all(data(:,1)==fileid, data(:,2)==id1, data(:,3)==id2);
plot(data(logicalMask,4),data(logicalMask,5));
end
% Cell array example:
% Store [X, Y] arrays in separate cells.
% Cell index defined by file number and other IDs
for file = 1:999
X{file}{id1}{id2} = X;
Y{file}{id1}{id2} = Y;
end
% plot single X,Y pair of data with specified ids
function plotProfile(data,fileid,id1,id2)
plot(X{fileid}{id1}{id2}, Y{fileid}{id1}{id2})
end
  16 Comments
MC
MC on 8 Jul 2021
Edited: MC on 8 Jul 2021
"That is what tables and tall tables are for."
Yes, and the route I am progressing with. Thank you. :-)
"But it also depends on how you need to process your data. A cell array or structure might also be suitable."
Hence my original question. There are many ways to approach this and all would work. But what would be most efficient given the semi-structured data that I'm dealing with...?
dpb
dpb on 8 Jul 2021
Edited: dpb on 8 Jul 2021
"what tables and tall tables are for ... also depends on how you need to process your data"
But, my observations have been similar to report above that when tables get to be quite long performance lags so while the structure is there, as implemented and refined to date it becomes impractical with large datasets. TMW will undoubtedly continue to improve the implementation with time.
Also, I did not recognize in the initial response the multiple grouping variables ID1 and ID2, I thought the application was just one set of X,Y data for a number of tests and the intent was simply to plot those by test. Adding more criteria makes the rearrangement more appropriate, agreed, with again still the problem of performance may be a kick in the teeth in the most straightforward way of just using tables. And, it was clearly demonstrated the performance hit the tall table object extracts--if it's the only way, it's probably better than not being able to analyze the data at all, but that definitely comes at a price.

Sign in to comment.

Answers (0)

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!