Summing for loop speed-up by multiplication with unity.

Question

0 votes

Hi

I was benchmarking the speed of for loops versus proper reshaped-matrix-multiplication in the context of tensor multiplication. Essentially I wanted to calculate a third order tensor fo_{k,l,m} defined as fo_{k,l,m}=sum_{n=1}^{nmax}pt_{k,n}*ft_{n,l,m} from two given objects pt and ft. As expected I found, that implementing this via reshaping and matrix multiplication is much faster compared with discrete for loops.

However, I found some odd behaviour in the speed of the discrete for loop implementation. On my machine this code takes 18secs:

nk=115;
nl=116;
nm=117;
ft=rand([nk,nl,nm]);
fo = zeros(nk,nl,nm);
pt=rand(nk);
tic
for cntk = 1 : nk
  for cntl = 1:nl
      for cntm = 1: nm
          for cntsum = 1: nk
              fo(cntk,cntl,cntm)=fo(cntk,cntl,cntm)+pt(cntk,cntsum)*ft(cntsum,cntl,cntm);
          end
        end
    end
end
toc

Adding a factor of one in the innermost part of the loop brings the time down to only 4 secs:

nk=115;
nl=116;
nm=117;
ft=rand([nk,nl,nm]);
fo = zeros(nk,nl,nm);
pt=rand(nk);
tic
for cntk = 1 : nk
  for cntl = 1:nl
      for cntm = 1: nm
          for cntsum = 1: nk
              fo(cntk,cntl,cntm)=1*fo(cntk,cntl,cntm)+pt(cntk,cntsum)*ft(cntsum,cntl,cntm);
          end
        end
    end
end
toc

Curiosity made me check the situation with non-nested loops with a scalar / vector. Here the speed-up is opposite. This takes .45 secs

a=0
ntest=100000000;
b=rand(1,ntest);
tic
for cnt = 1 : ntest
  a = a + b(ntest);
end
toc

And this takes .75 secs:

a=0
ntest=100000000;
b=rand(1,ntest);
tic
for cnt = 1 : ntest
  a = 1*a + b(ntest);
end
toc

I wonder what MATLAB (ver. 2015a) is doing differently during the execution of the two versions. Any ideas?

Kind regards

Zaph

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jan on 2 Dec 2017

Edited: Jan on 2 Dec 2017

Open in MATLAB Online

0 votes

My timings under R2015b/64/Win7:

 Elapsed time is 16.105212 seconds.  % No "1*"
 Elapsed time is 16.074966 seconds.  % With "1*"
 Elapsed time is  0.894157 seconds.  % Faster method below

My timings under R2016b/64/Win7:

 Elapsed time is  4.882229 seconds.  % No "1*"
 Elapsed time is  4.767648 seconds.  % With "1*"
 Elapsed time is  1.163444 seconds.  % Faster method below

Obviously JIT acceleration has a strong effect on the runtime. It seems, like in your R2015a the JIT profits from the multiplication by 1 - for unknown reasons. The JIT is not documented and we could only speculate what's going on.

It is nice, that the naive loop runs 3 times faster in R2016b, but what a pity that the faster version gets 25% slower:

for cntk = 1 : nk
   ptv = pt(cntk, :);
   for cntl = 1:nl
      fo(cntk, cntl, :) = ptv * reshape(ft(:, cntl, :), nk, nm);
   end
end

2 Comments
Show None Hide None

Zaphod on 2 Dec 2017

Edited: Zaphod on 2 Dec 2017

Open in MATLAB Online

Hi Jan,

Thanks for picking up on this one. In terms of best speed I'm currently working with this line:

fo=reshape(pt*reshape(ft,[nk,nm*nl]),[nk,nl,nm]);

It's doing the trick in a few milliseconds (2015a,MacBook).

Your results with different MATLAB versions are very interesting and make me even more curious.

Thanks and have great day,

Zaph

Jan on 2 Dec 2017

Open in MATLAB Online

fo=reshape(pt*reshape(ft,[nk,nm*nl]),[nk,nl,nm]);

This takes 0.1 sec on my machine.

Sign in to comment.

Summing for loop speed-up by multiplication with unity.

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

2 Comments
Show None Hide None

Categories

Products

Tags

Community Treasure Hunt

Summing for loop speed-up by multiplication with unity.

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

2 Comments Show None Hide None

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

2 Comments
Show None Hide None