4 views (last 30 days)

Hello everyone,

This code snippet takes a lot of time with executing the multiple for loops. The loops (a-f) are dependent on the inner loops (i-k) and Matrixx ist also dependent on the outer loop (t). Any suggestions on how to vectorize multiple inner for loops for faster execution? I've tried meshgrid for (a,c,e) and (b,d,f) but that has not been compatible with the size of Bvector.

Thank you!

Nmax=100

tmax=1000

Matrixx=zeros(Nmax+3,Nmax+3,Nmax+3,tmax);

Matrixx(3,2,2,1)=10;

parameter=0.1;

for t=2:tmax

Bvector=zeros(Nmax+3,Nmax+3,Nmax+3);

for i=2:Nmax+2

for j=2:Nmax+2

for k=2:Nmax+2

for a=2:i

for b=2:i

for c=2:j

for d=2:j

for e=2:k

for f=2:k

if (i-2)+(j-2)+(k-2)<=Nmax

Bvector(i,j,k)=Bvector(i,j,k)+parameter*Matrixx(a,c,e,t-1)*Matrixx(b,d,f,t-1)

% More similar calculations

end

end

end

end

end

end

end

end

end

end

%More Code, Matrixx gets updated with earlier calculated Bvector. Because of the further calculations, this step has to be separate.

Matrixx(:,:,:,t)=Matrixx(:,:,:,t-1)+(tmax/100).*(Bvector);

end

Aditya Patil
on 23 Sep 2020

Edited: Aditya Patil
on 23 Sep 2020

Even if it may appear from the equation that Bvector updates are independent of each other, in reality, they are repeating the same calculations again and again.

The values in the matrix Matrixx are never updated inside the t loop. You can reuse the multiplications done on Matrixx for all B(i,j,k) calculations, using dynamic programming.

Basically, B(i,j,k) is made up of same Matrixx values as that of B(i,j,k - 1), with few addititons. This can be written as,

I might be slightly off on the equation, but the idea remains the same. Try to reuse as much computed values as possible.

The t loop can't be parallelized, as it depends on previous iteration of the loop. There might be other optimization opportunities, such as reducing cache misses, removing any branching that may exists, etc.

Aditya Patil
on 1 Oct 2020

Note that there are some other issues as well. Given the size of the matrix, you need to ensure minimal cache misses. The if conditions can also be eliminated, but I think that won't make much of a difference as the branch predictor should do a very good job in this case.

It might be the case that the bottleneck is cache, and not computation.

Aditya Patil
on 1 Oct 2020

Normally when you access memory, it should be in linear fashion. If you access memory locations which are not nearby, then you might be facing high cache miss rates, as for every read, CPU cannot find the next value in cache, and has to fetch it from memory. Next time, again it's not in cache, and so on.

In your case, it seems like you access Matrixx(x,x,f,x) followed by Matrixx(x,x,f + 1,x). These two will have quite some distance between there memory locations, which might be greater than what your cache can hold.

Try reorganizing Matrixx so that the values you access one after another also come one after another in memory.

Opportunities for recent engineering grads.

Apply TodayFind the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 0 Comments

Sign in to comment.