MATLAB Answers

What is the best way to implement thousands of data in multiple matrices ?

2 views (last 30 days)
V.D-C
V.D-C on 18 May 2020
Commented: V.D-C on 18 May 2020
Hello,
I have a folder with 10.000 .mat files, each of them is a structure with 8 different variables (one value each).
Each of these files is named accordingly to the row and column it is corresponding to in a 330x300 matrix (eg 123_25 is the file with the values of the 123th row and 25th column).
I have 8 matrices 330x300 in size, one for each variable and filled with NaNs. I would like to know which technique is the best to insert the values of my .mat files into these matrices ?
So far I tried 2 methods:
  • A double-loop with j and k being respectively the row and column indexes (for every j, k loops from 1 to 300). Everytime k changes, the corresponding .mat file is loaded and its values are inserted in the corresponding matrices. The 8 matrices are loaded BEFORE the loops.
%% Example
load the 8 matrices
for j = 1:330
for k = 1:300
load([j '_' k '.mat']);
matrix_1(j,k) = j_k.variable1;
matrix_2(j,k) = j_k.variable2;
matrix_3(j,k) = j_k.variable3;
etc . . .
end
end
  • The same double-loop but instead of loading the 8 matrices, I use the "matfile()" function and replace only the "j" and "k" indexed NaN with the corresponding .mat file's values.. As before, the corresponding .mat file is loaded at every iteration.
%% Example
% I create n.matfile() for each matrix
for i = 1:numel(list_names)
save('-v7.3',[temp_folder list_names{i} '.mat'], list_names{i});
m.(list_names{i}) = matfile(list_names{i},'Writable',true);
end
%%% list_names is a list with the names of the variables. In this example the list would go from matrix_1 to matrix_8
for j = 1:330
for k = 1:300
load([j '_' k '.mat']);
m.matrix_1(j,k) = j_k.variable1;
m.matrix_2(j,k) = j_k.variable2;
etc . . .
end
end
By going into the matlfile documentation I read that the most efficient way to deal with it would be to load everything at once in the memory and do all the replacements. Please note than the 10.000 files altogether are never heavier than 250Mb, and my pc has 16Gb of RAM.
I would like to try another method which would be:
Loading all the .mat files in the memory, loading the 8 matrices, inserting all the values with a loop without loading the files at every iteration. However I face a difficulty which is that my .mat files may have a different name, but they are all constructed the same way. So when I load a file and have it in the workspace, if I load another file it replaces the previous one, hence I can not load 2 files at the same time. Is there a way to load these files altogether at once even though they are built the same way, or is there a way to create dynamic names for variables (I know it is a bad idea) so I can load more than 1 file at a time ?
Finally, which method would be the fastest ? Maybe there is another one I didn't think of ?
I hope I was clear in my explanations, if not I apologize and I will try to explain again as clearly as possible.
Have a good day and thank you !

  0 Comments

Sign in to comment.

Accepted Answer

Stephen Cobeldick
Stephen Cobeldick on 18 May 2020
Edited: Stephen Cobeldick on 18 May 2020
I would not use either of the first two methods, because they are an easy way to get latent, almost undetectable errors in your data. You might assume that your data is perfect and write your code accordingly, but that is not a robust approach. Consider what would happen if one of the files is missing any of those variables: the variable value from the previous loop iteration would get used without any warning whatsoever. No doubt you will say "but my data are perfect and are not missing anything..." sure, sure.
Much more robust (and more efficient) is to ensure that every file imports the required data:
S = load(...);
and the simply access the fields of the structure S:
S.var1
S.var2
...
This also answers your next question:
"is there a way to load these files altogether at once even though they are built the same way"
Of course, this is MATLAB, so just use a structure array! If every mat file contains exactly the same variable names (as they should) then your task is easy, you can just do this:
S(j,k) = load(...)
If the mat files contain different variable names then go and yell at the person who created them.
"is there a way to create dynamic names for variables ...so I can load more than 1 file at a time ?"
Importing multiple files can be done simply and efficiently using indexing into one array, there is absolutely no need to use ugly, slow, inefficient, complex, overused-by-beginners dynamic variable names.

  5 Comments

Show 2 older comments
V.D-C
V.D-C on 18 May 2020
I was a bit too optimistic when I tried to make your code work, I sadly don't have a lot of experience in coding and everything takes a lot of time. I will mark your answer as "accepted" and as soon as I have the results I will post them here.
I'm just having troubles with loading the variables into 1 structure since the variables are in a substructure in a structure. I will come back when I'll find a way to make it work.
Stephen Cobeldick
Stephen Cobeldick on 18 May 2020
"I'm just having troubles with loading the variables into 1 structure "
It looks like you did something like this:
out = load(...);
in which case (if every file contains exactly the same variable names) you can allocate that scalar structure to a non-scalar structure using indexing:
S(j,k) = out;
You provided lots of screenshots but did not actually state which structure fields you want to store. If you don't want to store all of that imported data, then just pick the field you want, e.g.:
tmp = load(...);
S(j,k) = tmp.RESULT;
V.D-C
V.D-C on 18 May 2020
Thank you again !
I only want to import the variables in the RESULT substructure. The other ones are not important but have to be saved in case somebody wants to take a look at the other parameters.
I tested you suggestion and it works !! Hopefully this solution will stick with me for my whole programming life :)
So thank you very much for your answers !!

Sign in to comment.

More Answers (1)

Steven Lord
Steven Lord on 18 May 2020
Rather than creating 8 individual variables why not create a 3-dimensional array of size [330 300 8]?
Z = NaN(6, 5, 4);
for pages = 1:4
for columns = 1:5
for rows = 1:6
Z(rows, columns, pages) = (rows*pages)+(columns^(pages-1));
end
end
end
Z(4, 2, 3) % 4*3 + 2^2 = 16
Although with the way your data is ordered, you'd want pages to be the innermost loop. That way you can load your data as soon as rows and columns are defined and iterate through the loaded data in the pages loop, filling in the appropriate elements in Z at each iteration.

  1 Comment

V.D-C
V.D-C on 18 May 2020
Hello, thank you for your answer !
I didn't think at all of doing it this way, I will try it !

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!