How to speed up reading from N-dimensional matrix in a nested loop?

Question

0 votes

Hi,

I've got a piece of code that has to run in many nested loops. In the next-to-most inner loop I need to grab some elements from an N-dimensional matrix (N>=5) and perform some calculations on these values. When running the profiler I found that the bottle neck is reading elements from the N-dimensional matrix. The matrix size will typically be [9,9,9,9,9,100] so even though it is big, it's not extremely large.

I wonder if you guys have any good idea on how to speed up this code? Any ideas would be extremely welcome - I've spent days reading on google and trying different things in the profiler, and the quickest way I've found is to stack all the data into a 3D matrix, and looping through the 3rd dimension of this new matrix to perform the calculations.

See a code example below (in this code there might be ways to get rid of some of the loops, but that won't be the case in my real code since that will have a lot of logic on each level). It takes ~1.6sec to run on my laptop, and of this ~1.35 sec is spent reading from the N-dimensional matrix...

Kindly, Anders

% File: test.m  
clear
%%This section can not be changed in my real code).
%Set up some grids:
small_grid=3;
large_grid=4;
H=small_grid;
a=linspace(1,4,large_grid)'; A=length(a);
b=linspace(1,3,small_grid)'; B=length(b);
c=linspace(1,2,small_grid)';C=length(c);
d=linspace(1,4,small_grid)';D=length(d);
e=linspace(1,4,10)';E=length(e);
f=1:10;
[ga gb gc gd ge]=ndgrid(a,b,c,d,e);
I=32;%
fn1=ga./3+gb/.5 +gc./3 +gd./4 +ge/4;%Some function
fn2=ga./1+gb/.1 +gc./10 +gd./4 +ge/.1;%Some other function
soughtValue=zeros(length(f),8);
tic
for h=1:H
  for a_=1:A
    aa=rand(I,1).*(a(end)-a(1))+a(1);%In the real code each level will have a lot of logic...
    for b_=1:B
      bb=rand(I,1).*(b(end)-b(1))+b(1);
      for c_=1:C
        cc=rand(I,1).*(c(end)-c(1))+c(1);
        for d_=1:D
          dd=rand(I,1).*(d(end)-d(1))+d(1);
          for e_=1:E
            ee=rand(I,1).*(e(end)-e(1))+e(1);
              % Calculate which elements to pick out
              before1=floor(rand(length(f),1)*(length(a)-1)+1);
              after1=before1+1;
              before2=floor(rand(length(f),1)*(length(b)-1)+1);
              after2=before2+1;
              c_index=floor(rand(length(f),1)*(C-1)+1);
              d_index=floor(rand(length(f),1)*(D-1)+1);
              % Some parameters
              a1=floor(rand(length(f))*(length(f)-1)+1);
              a2=floor(rand(size(f))*(length(f)-1)+1);
              a3=floor(rand(size(f))*(length(f)-1)+1);
              a4=floor(rand(size(f))*(length(f)-1)+1);
              %%This is the section that can be changed in order to speed up calculations
              for ff=1:length(f)
                %I've found that it's quickest to store the data in a 3D
                %matrix and then loop over the last dimension of this matrix
                %to perform the calculations. But even this is slow!
                new_grid(:,:,1)=  (fn1(:,:,c_index(ff)  ,d_index(ff) ,ff));
                new_grid(:,:,2)=  (fn1(:,:,c_index(ff)+1,d_index(ff) ,ff));
                new_grid(:,:,3)=  (fn1(:,:,c_index(ff)  ,d_index(ff)+1,ff));
                new_grid(:,:,4)=  (fn1(:,:,c_index(ff)+1,d_index(ff)+1,ff));
                new_grid(:,:,5)=  (fn2(:,:,c_index(ff)  ,d_index(ff) ,ff));
                new_grid(:,:,6)=  (fn2(:,:,c_index(ff)+1,d_index(ff) ,ff));
                new_grid(:,:,7)=  (fn2(:,:,c_index(ff)  ,d_index(ff)+1,ff));
                new_grid(:,:,8)=  (fn2(:,:,c_index(ff)+1,d_index(ff)+1,ff));
                for gg=1:8
                  soughtValue(ff,gg) =...
                    ( a1(ff).*new_grid(before1(ff),before2(ff),gg) ) ...
                   +( a2(ff).*new_grid(before1(ff),after2(ff),gg)  ) ...
                   +( a3(ff).*new_grid(after1(ff) ,before2(ff),gg)  ) ...
                   +( a4(ff).*new_grid(after1(ff) ,after2(ff),gg)   );
                end
                % The first iteration of the loop will look like this. Notice
                % that this is many times slower than using the loop!
  %               soughtValue(ff,1)=...
  %                   ( a1(ff).*(fn1(before1(ff),before2(ff),c_index(ff)  ,d_index(ff) ,ff)) ) ...
  %                  +( a2(ff).*(fn1(before1(ff),after2(ff),c_index(ff)  ,d_index(ff) ,ff)) ) ...
  %                  +( a3(ff).*(fn1(after1(ff),before2(ff),c_index(ff)  ,d_index(ff) ,ff)) ) ...
  %                  +( a4(ff).*(fn1(after1(ff),after2(ff),c_index(ff)  ,d_index(ff) ,ff))  );
              end
              %%End of code that can be changed
            end
          end
        end
      end
    end
  end
  toc

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Cedric on 21 Jan 2013

Open in MATLAB Online

0 votes

Hi Anders,

Just by replacing the 8 lines defining new_grid with

 d3index = [c_index(ff),c_index(ff)+1] ;
 new_grid = cat(3, ...
                fn1(:,:,d3index,d_index(ff),ff), ...
                fn1(:,:,d3index,d_index(ff)+1,ff), ...
                fn2(:,:,d3index,d_index(ff),ff), ...
                fn2(:,:,d3index,d_index(ff)+1,ff)) ;

you would make it slightly more efficient. About the loop that follows, are you sure that a1(ff),..,a4(ff) are correct indexing-wise? These are 2D arrays that you are accessing using linear indexing.. which seems weird in this context in the sense that you will get the first column of each matrix with ff in [1,..,numel(f)], which we usually index with e.g. a1(ff,1).

Cheers,

Cedric

7 Comments
Show 5 older comments Hide 5 older comments

Matt J on 22 Jan 2013

Edited: Matt J on 22 Jan 2013

Open in MATLAB Online

The a1 line should read: a1=floor(rand(size(f)) x (length(f)-1)+1);

If that's the case then all lines a1...a4 are computed with the exact same statement. It then makes more sense to compute a single A matrix rather than 4 separate "a" variables

A=floor(rand(length(f),4)*(length(f)-1)+1);

where now a1=A(:,1), a2=A(:,2), etc...

Anders Österling on 23 Jan 2013

Hi, I just gave it a shot and it did indeed work out to be 20-25% faster than my orignal code. Many thanks Cedric!!!

(Regarding the a's: you're correct Matt! It's better to create them using one line and one A-matrix. But in my 'real' code this is not possible and thus I left them as they are!

Sign in to comment.

Answer 2

Matt J on 21 Jan 2013

Edited: Matt J on 21 Jan 2013

Open in MATLAB Online

0 votes

You can get rid of the loop over gg

soughtValue(ff,:)= ...
                  ( a1(ff).*new_grid(before1(ff),before2(ff),:) ) ...
                 +( a2(ff).*new_grid(before1(ff),after2(ff),:)  ) ...
                 +( a3(ff).*new_grid(after1(ff) ,before2(ff),:)  ) ...
                 +( a4(ff).*new_grid(after1(ff) ,after2(ff),:)   );

Also, you should avoid repeated matrix indexing operations. If you're going to be reusing portions of an array, save them to a temporary variable, e.g.,

   cff=c_index(ff);
   dff=d_index(ff);
   tmp=[cff,cff+1];
                  new_grid(:,:,1:2)=  (fn1(:,:,tmp ,dff ,ff));
                  new_grid(:,:,3:4)=  (fn1(:,:,tmp ,dff+1,ff));
                  new_grid(:,:,5:6)=  (fn2(:,:,tmp ,dff ,ff));
                  new_grid(:,:,7:8)=  (fn2(:,:,tmp ,dff+1,ff));

4 Comments
Show 2 older comments Hide 2 older comments

Matt J on 21 Jan 2013

Open in MATLAB Online

Not so surprisingly. PERMUTE is expensive. It wouldn't make sense to do it inside any of the loops. I'm also not sure what this

if all([h,a_,b_,c_,d_,e_]==1)

is for.

Cedric on 21 Jan 2013

It is for permuting fn1 and fn2 only at the first overall iteration, just to fulfill the requirement not the bring modifications outside of the block that he define as being open to modifications.

The if all() doesn't slow down significantly the execution as far as I can see.

Sign in to comment.

Answer 3

Matt J on 21 Jan 2013

Edited: Matt J on 21 Jan 2013

Open in MATLAB Online

0 votes

Alright, well here's my fastest version. With the following, I get a 25% speed-up over the original code.

        fn=cat(6,fn1,fn2); %PRIOR TO LOOPS
             for ff=1:length(f)
               new_grid=  fn(before1(ff)+[0,1],before2(ff)+[0,1],...
                             c_index(ff)+[0,1] , d_index(ff)+[0,1],...
                             ff,:);
                AA(4)=a4(ff);
                AA(3)=a2(ff);
                AA(2)=a3(ff);
                AA(1)=a1(ff);
                soughtValue(ff,:)=AA*reshape(new_grid,4,8);
              end

6 Comments
Show 4 older comments Hide 4 older comments

Anders Österling on 23 Jan 2013

Thanks a lot Matt! The first part of the code is really nice, but it seems like it's a tiny bit slower than Cedrics suggestion. But I will keep it as a concice way of writing it! :)

Regarding the last part: I seems to me that this takes ~twice as much computational time as the 8-iteration-loop. Or am I missing something here?

Cheers, Anders

Matt J on 23 Jan 2013

Edited: Matt J on 23 Jan 2013

Open in MATLAB Online

Hi Anders. Not sure what we're calling "the last part". I definitely find that

soughtValue(ff,:)=AA*reshape(new_grid,4,8);

slows things down, as compared to a for-loop, but I haven't measured whether one is twice as fast as the other. It certainly doesn't make more than a 5% impact on the total code's execution time. My fastest version right now is below. By my timings, it's a few percent slower than Cedric's.

My best guess as to why the for-loop is faster is that, maybe, it doesn't allocate any temporary arrays. Possibly, a matrix-vector multiplication always puts its output in a temporary vector, whereas your for-loop might copy results into soughtValue(ff,gg) in a purely in-place fashion.

A few final remarks on all of this:

Because you're jumping around to random locations in an array pulling out values, there are going to be serious limits on how fast you can get. Memory accesses are fastest when done sequentially.
You're doing lots of redundant calculations throughout the code as whole. It doesn't seem to be a good idea to be calling length(f) and size(f) so repeatedly within a for-loop. Each execution by itself is obviously pretty trivial, but the overhead of all those function calls probably adds up.

           %Pre-loop computations
           fn=cat(6,fn1,fn2);
           new_grid=zeros(4,8);
           nf=length(f);
             for ff=1:nf
              new_grid(:) =  fn(before1(ff)+[0,1],before2(ff)+[0,1],...
                             c_index(ff)+[0,1] , d_index(ff)+[0,1],...
                             ff,:);
             %AA1=a1(ff); AA2=a2(ff); AA3=a3(ff); AA4=a4(ff);
                for gg=1:8
                  soughtValue(ff,gg) =...
                    a1(ff).*new_grid(1,gg)+ ...
                    a2(ff).*new_grid(2,gg)+ ...
                    a3(ff).*new_grid(3,gg)+ ...
                    a4(ff).*new_grid(4,gg);
                end
              end

Sign in to comment.

How to speed up reading from N-dimensional matrix in a nested loop?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

7 Comments
Show 5 older comments Hide 5 older comments

More Answers (2)

4 Comments
Show 2 older comments Hide 2 older comments

6 Comments
Show 4 older comments Hide 4 older comments

Categories

Products

Tags

Community Treasure Hunt

How to speed up reading from N-dimensional matrix in a nested loop?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

7 Comments Show 5 older comments Hide 5 older comments

More Answers (2)

4 Comments Show 2 older comments Hide 2 older comments

6 Comments Show 4 older comments Hide 4 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

7 Comments
Show 5 older comments Hide 5 older comments

4 Comments
Show 2 older comments Hide 2 older comments

6 Comments
Show 4 older comments Hide 4 older comments