Make code faster for import and array creation

4 views (last 30 days)
Dear All,
Here is a bit of code that I wrote to import a bunch of XY data in CSV form, extract the first column from every file and store it as any array (mydata_x), then extract the second column from every file and store it as an array (mydata_y). Then I made a third array that will order all the files for plotting (mydata_z)
It works, but it's unbelievable slow. My question is, how can I make it faster?
%bring the data in from CSV format into a cell-array
csvFiles = dir('*.csv'); %works for the current directory only
numfiles = length(csvFiles); %try to keep only the data you want in the dir
mydata = cell(1,numfiles); %finds the total number of files in the dir
%save each XY data file into a cell of a cell-array called 'mydata'
for k = 1:numfiles
mydata{k} = importdata(csvFiles(k).name);
end
%Now I need to generate the X, Y, and Z data
for i=1:numfiles
mydata_X(:,i) = [mydata{i}(:,1)] % get data from the first column from all cells in mydata
end
for i=1:numfiles
mydata_Y(:,i) = [mydata{i}(:,2)] % get data from the second column from all cells in mydata
end
for i=1:numfiles
arrayOfZ = 1:numfiles
mydata_Z(i,:) = arrayOfZ(1,:) % here I assign 1 through 'numfiles' to offest all spectra
end
%from here I can mesh(mydata_X, mydata_Y, mydata_Z)
Any help would be greatly appreciated!!
Jenna

Accepted Answer

Sven
Sven on 23 Sep 2012
Edited: Sven on 23 Sep 2012
Hi Jenna, I bet this is a pre-allocation issue.
When you build a big matrix one column at a time (such as is being done for mydata_X and mydata_Y), MATLAB needs to do some juggling of memory to make the variable a little bit bigger on each iteration. If you know how big the end result will be before you start building your matrix, you can initialise the matrix to this size just once, and MATLAB no longer has the overhead of juggling memory space inside the loop.
That said, there might even be a sneakier way of avoiding the loops you're using. Let's say that after you read all the files:
%bring the data in from CSV format into a cell-array
csvFiles = dir('*.csv'); %works for the current directory only
numfiles = length(csvFiles); %try to keep only the data you want in the dir
mydata = cell(1,numfiles); %finds the total number of files in the dir
%save each XY data file into a cell of a cell-array called 'mydata'
for k = 1:numfiles
mydata{k} = importdata(csvFiles(k).name);
end
At this point, is the matrix of every cell in mydata the same size? It seems that it should be, based on the rest of your code... If it is, you can make a 3D matrix with one "sheet" for each cell in mydata like this:
myData3d = cat(3, mydata{:});
And then you can get mydata_X like this:
mydata_X = myData3d(:,1,:);
Now, mydata_X here will be an N-by-1-by-NFILES matrix, whereas I think from your code that your original mydata_X ended up as a N-by-NFILES matrix. If you want to get the same as your original, you can just use reshape or permute:
mydata_X = permute(myData3d(:,1,:), [1 3 2]);
The same can be done for mydata_Y.
Did this get you going? If it works, it will be much faster that the loop that builds matrices one column at a time.
--
And also, the last loop you have:
for i=1:numfiles
arrayOfZ = 1:numfiles
mydata_Z(i,:) = arrayOfZ(1,:) % here I assign 1 through 'numfiles' ...
end
Can be replaced by:
mydata_Z = repmat(1:numfiles, numfiles, 1);
Although I'm not quite sure what purpose this variable serves... it will be of a different size to mydata_X and mydata_Y ...
  1 Comment
Jennifer
Jennifer on 23 Sep 2012
Thank you so much Sven!
That worked wonderfully! What took >3hrs to do (don't actually know because it never finished), took only seconds with your suggestion. I did the permute for the Y data and all is working.
For Z, your right, Z was not the same size. Based on your suggestion, I fixed it to be:
mydata_Z = repmat(1:numDataPoints, numfiles, 1);
where numDataPoints is the number of X data points in the file (3772) for for all the files (numfiles = 180 in this case.)
mydata_Z is the variable in the experiment. These data are time-resolved Raman spectra that I would like to see plotted in 3D. Z can be any scalar offset of time (1 sec between spectra, or 5 minutes between spectra), and I would like those values plotted with the data rather than each spectra assigned to 1, 2, 3, ..., N in the sequence.
Now I can plot mesh(X,Y,Z) and see the data in 3D.

Sign in to comment.

More Answers (0)

Categories

Find more on MATLAB in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!