40 views (last 30 days)

Show older comments

I am using R2017b. I am trying to do the same thing 3 different ways. (1)using 2 nested for loops (2) replacing inner for loop with an inbuilt function and (3) replacing both for loops with an implicit expansion and a sub-function. The code and cputimes (in the same order) are listed below

tic;

c=1;

for i = 1:length(t)

n = length(t{i});

for j = 1:n

e(1,c) = t{i}(j);

e(2,c) = t{i}(max(1, mod(j+1,n+1)));

c = c + 1;

end

end

toc;

tic;

e = [];

for i = 1:length(t)

e = [e, [t{i};circshift(t{i}, -1)] ];

end

toc;

tic;

e = cellfun(@cshiftCell, t, 'UniformOutput', false);

toc;

function em = cshiftCell(ec)

em = [ec; circshift(ec,-1)];

end

Elapsed time is 0.077777 seconds.

Elapsed time is 0.177772 seconds.

Elapsed time is 0.525799 seconds.

This is a bit counter-intuitive to me because I have always heard nesting for-loops always lead to worst performance (except in certain languages like Julia) and was expecting the cputimes to be in exact reverse order. Does anyone have an explanation as to what is going on? I am developing a relatively large scale code and understanding such performance aspects of matlab is critical to me. I am happy to read any articles/documentation you point me to as well. Thanks!

Edit: Solved! Best explanation was provided by Walter . A superior approach was shown by Cedric.

Cedric Wannaz
on 21 Oct 2017

I won't have time until possibly the end of the week end, but one thing that should improve the performance of solution 1 is to prealloc e:

e = zeros(2, sum(cellfun('length', t))) ;

Cedric Wannaz
on 21 Oct 2017

Edited: Cedric Wannaz
on 21 Oct 2017

Well, I got a few minutes at the airport. Try this in your comparison:

tic ;

s = cellfun('length', t) ;

v = cumsum(s) ;

e_cw = double([t{:}]) ;

e_cw = [e_cw; e_cw(2:end),0] ;

e_cw(2,v) = e_cw(1,[0,v(1:end-1)]+1) ;

toc

(your move, Andrei ;) )

Walter Roberson
on 20 Oct 2017

I do not know if it is still the case, but cellfun() at least used to be implemented as pure MATLAB, and so (at least then) could never be faster then coding the calls yourself.

Calling an anonymous function has a surprising amount of overhead. If you call them a large number of times, that can really add up.

Built-in functions are really divided into two cases: those implemented as MATLAB, and those implemented as compiled code. The ones implemented as MATLAB can never be faster than doing the work yourself, and are often slower because of the error checking and parsing that is done on every call. If you "which" a function name and it says "built-in" then at least the top layer of it is compiled code and there is the potential for being faster than doing the work yourself.

For loops are typically slower than vectorization -- but only if the equivalent work is being done. It is not uncommon that in order to vectorize, you end up creating non-trivial internal arrays, and vectorization is often unable to take advantage of known stopping conditions. For example, code of the form

for K = 1 : 1e7

if vectorizable_operation(K) == key_value

break;

end

end

can be faster than

find( vectorizable_operation(1:1e7) == key_value, 1, 'first')

because the cost of doing the calculations on the values that will turn out to be unused can be a fair bit. Even in cases where the cost of a particular vectorized operation is quite low, usually less than the "for" overhead, if you are low on memory then looping might turn out to be faster, as it can avoid pushing your memory use to the point where you are swapping.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!