Replace nested loops?

Is it possible to replace 4 for loops in the form: for if for for if for if .... ....
with something that is more efficient?
because some of the data i run have millions of variables, and i have lots of data set to run, it takes a few days to finish them. So anything that would lower the run time of this section would be geatly appreciated. Thanks
[EDITED: Code yopied form the comments, Jan Simon]
for i=1:length(starts)
counter = 0;
if isempty(starts{i}) == 0
for j = 1: length(starts{i})
for k = 1: length(starts)
if isempty(starts{k}) ==0
for m = 1:length (starts{k})
if stops{i}(j) >= starts{k}(m) && stops{i}(j)< stops{k}(m) && isempty(peak_loc3{k})==0 && peak_loc3{i}(j)~= peak_loc3{k}(m)
counter = counter +1;
overlap{1,i}(counter) = peak_loc3{k}(m);
overlap{2,i}(counter) = peak_loc3{i}(j);
end
end
end
end
end
end
end

1 Comment

We really need to see the operations to figure out if it's possible. A well orchestrated for-loop should be fairly fast in newer versions.

Sign in to comment.

 Accepted Answer

Sven
Sven on 7 Dec 2011
A small time-drain will be the fact that inside the loop, the overlap variable gets constantly resized. In the MATLAB editor, these variables will have a little orange line under them. If you hover over that line, it will warn you about this potential problem.
Here's a first attempt that will reduce the time needed. Note I've also replaced "isempty(x)==0" with "~isempty(x)" (for simplicity) and replaced some of the nested if statements with continue statements, just to have less nesting (which can get confusing).
overlap = cell(2, length(starts));
nonEmptyStarts = find(~cellfun(@isempty,starts));
for i=nonEmptyStarts
counter = 0;
thisStart = starts{i};
thisStop = stops{i};
for j = 1: length(thisStart)
for k = nonEmptyStarts
if isempty(peak_loc3{k}), continue; end
thatStart = starts{k};
thatStop = stops{k};
thisMask = thisStop(j)>=thatStart & thisStop(j)<thatStop & peak_loc3{i}(j)~=peak_loc3{k}(1:length(thatStart))';
for m = find(thisMask);
counter = counter +1;
overlap{1,i}(counter) = peak_loc3{k}(m);
overlap{2,i}(counter) = peak_loc3{i}(j);
end
end
end
end
Unfortunately there is still a big culprit of "variable size adjustment" sitting inside a loop, which will really slow down the code. If you see the line starting with overlap{1,i}(counter) =, you'll notice that every time this line is run, the variable sitting in the cell at overlap{1,i} grows by one. If this happens a lot, MATLAB has to work really hard to find new space in memory fitting this new size.
This updated code currently has an approximately 10-fold reduction in running time to the original.
UPDATE
overlap = cell(2, length(starts));
nonEmptyStarts = find(~cellfun(@isempty,starts));
for i=nonEmptyStarts
counter = 0;
% Get column vectors of the first start/stop pairs
startA = starts{i}'; stopA = stops{i}';
for k = nonEmptyStarts
% Get row vectors of the second start/stop pairs
startB = starts{k}; stopB = stops{k};
% Get a mask of all A-B pairs that match requirements
ABMask = bsxfun(@ge,stopA,startB) & ...
bsxfun(@lt,stopA,stopB) & ...
bsxfun(@ne,peak_loc3{i}(1:numel(startA)), peak_loc3{k}(1:numel(startB))');
[j,m] = find(ABMask);
numToAdd = length(m);
if ~numToAdd, continue; end
% Append them to "overlap"
indsToInsert = (1:numToAdd) + counter;
counter = counter + numToAdd;
overlap{1,i}(indsToInsert) = peak_loc3{k}(m);
overlap{2,i}(indsToInsert) = peak_loc3{i}(j);
end
end
This update should make significant improvements on a large dataset. There is still room for improvement, depending on the type and sizes of data you have. You can actually get a good view of what parts of the code take the most time by replacing tic and toc with profile on and profile viewer.
I have a feeling that the assignment into overlap will still be the biggest area for possible improvement.

12 Comments

Andy
Andy on 7 Dec 2011
this is what i already have for overlap:
overlap = cell(2,length(peak_loc3));
start, stop is the start/stop of peaks, since i am graphing a chromatogram. It is a list of cells, and each cell has a number of variables in it.Peak_loc3 is a list of cells and each cell has a number of peaks recorded in it.
Andy
Andy on 7 Dec 2011
i put a tic toc before and after the two overlap{1,i} amd {2,i}
tic
overlap{1,i}(counter) = peak_loc3{k}(m);
overlap{2,i}(counter) = peak_loc3{i}(j);
toc
its only taking 0.000016 second each iteration. Also, i think there is only 1~2 million iterations to run, that should only take 32 seconds though
Sven
Sven on 7 Dec 2011
Can you please give code to make just a small example for each variable?
peak_loc3 = {[1 5 6 7 9],[1 8 9 15 20 87]};
starts = {<please fill in>};
stops = {<please fill in>};
We just need enough to run something.
Andy
Andy on 7 Dec 2011
peak loc should be like this:
peak_loc3 = {[1;5;6;7;9],[1;8;9;15;20;87]};
starts={[1592 1616 1622 2000 2505],[2992 2848 299 4958]};
stops={[1588 1700 1617 1980 2500],[2956 2840 239 4950]};
Sven
Sven on 7 Dec 2011
And each corresponding cell of starts/stops should be the same size?
Andy
Andy on 7 Dec 2011
yes
Andy
Andy on 7 Dec 2011
they should be, but doesnt have to. I ran the code with the data i posted, works
Sven
Sven on 7 Dec 2011
Ok, I've made a small update to replace the innermost loop. It makes a 10x speed increase and gives the same answer as your original. There are still *big* improvements to be made. You say you have ~1-2million iterations. Is this because starts/stops is a very long cell, or is because each cell entry has a very large array of start/stop locations?
Andy
Andy on 7 Dec 2011
because Peak_loc3 is a long list of cells(a bit more than 100), and inside some of those cells may contain hundreds of variables, therefore start stop will also contain same number of variables.
Andy
Andy on 7 Dec 2011
i am testing and comparing your fix right now, thanks
Andy
Andy on 7 Dec 2011
wow your first update made my code go from 160 seconds to 11 seconds!!
Andy
Andy on 7 Dec 2011
the second update took it down to 1 second! thanks! let me try running my slowest set of data :D

Sign in to comment.

More Answers (0)

Tags

Asked:

on 7 Dec 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!