Summing for loop speed-up by multiplication with unity.
Show older comments
Hi
I was benchmarking the speed of for loops versus proper reshaped-matrix-multiplication in the context of tensor multiplication. Essentially I wanted to calculate a third order tensor fo_{k,l,m} defined as fo_{k,l,m}=sum_{n=1}^{nmax}pt_{k,n}*ft_{n,l,m} from two given objects pt and ft. As expected I found, that implementing this via reshaping and matrix multiplication is much faster compared with discrete for loops.
However, I found some odd behaviour in the speed of the discrete for loop implementation. On my machine this code takes 18secs:
nk=115;
nl=116;
nm=117;
ft=rand([nk,nl,nm]);
fo = zeros(nk,nl,nm);
pt=rand(nk);
tic
for cntk = 1 : nk
for cntl = 1:nl
for cntm = 1: nm
for cntsum = 1: nk
fo(cntk,cntl,cntm)=fo(cntk,cntl,cntm)+pt(cntk,cntsum)*ft(cntsum,cntl,cntm);
end
end
end
end
toc
Adding a factor of one in the innermost part of the loop brings the time down to only 4 secs:
nk=115;
nl=116;
nm=117;
ft=rand([nk,nl,nm]);
fo = zeros(nk,nl,nm);
pt=rand(nk);
tic
for cntk = 1 : nk
for cntl = 1:nl
for cntm = 1: nm
for cntsum = 1: nk
fo(cntk,cntl,cntm)=1*fo(cntk,cntl,cntm)+pt(cntk,cntsum)*ft(cntsum,cntl,cntm);
end
end
end
end
toc
Curiosity made me check the situation with non-nested loops with a scalar / vector. Here the speed-up is opposite. This takes .45 secs
a=0
ntest=100000000;
b=rand(1,ntest);
tic
for cnt = 1 : ntest
a = a + b(ntest);
end
toc
And this takes .75 secs:
a=0
ntest=100000000;
b=rand(1,ntest);
tic
for cnt = 1 : ntest
a = 1*a + b(ntest);
end
toc
I wonder what MATLAB (ver. 2015a) is doing differently during the execution of the two versions. Any ideas?
Kind regards
Zaph
Answers (1)
My timings under R2015b/64/Win7:
Elapsed time is 16.105212 seconds. % No "1*"
Elapsed time is 16.074966 seconds. % With "1*"
Elapsed time is 0.894157 seconds. % Faster method below
My timings under R2016b/64/Win7:
Elapsed time is 4.882229 seconds. % No "1*"
Elapsed time is 4.767648 seconds. % With "1*"
Elapsed time is 1.163444 seconds. % Faster method below
Obviously JIT acceleration has a strong effect on the runtime. It seems, like in your R2015a the JIT profits from the multiplication by 1 - for unknown reasons. The JIT is not documented and we could only speculate what's going on.
It is nice, that the naive loop runs 3 times faster in R2016b, but what a pity that the faster version gets 25% slower:
for cntk = 1 : nk
ptv = pt(cntk, :);
for cntl = 1:nl
fo(cntk, cntl, :) = ptv * reshape(ft(:, cntl, :), nk, nm);
end
end
2 Comments
Jan
on 2 Dec 2017
fo=reshape(pt*reshape(ft,[nk,nm*nl]),[nk,nl,nm]);
This takes 0.1 sec on my machine.
Categories
Find more on MATLAB in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!