How to efficiently index into memory and call functions within for loops

7 views (last 30 days)
Hello all, I have a code that I would like to speed up a little bit. There's one section where I input an array that has multiple dimentions (4+) and a few of these dimensions are orders of magnitude different in size. I read somewhere that I should always put the largest dimension last but from this quick test code I made that doesn't seem to be the case. The fastest situation on my computer is when a=a_cell{4} which has size (1000, 10000,10). It should also be noted for cases 5 and 6 I'm getting almost 8x slower performance vs the optimal 4th case. Is there any documentation as to why this would be the case, i.e.how matlab would pull memory/cache memory/call functions with the different dimention sizes?
a = rand(10000,1000,10);
possible_perms = perms([1 2 3]);
for n=1:size(possible_perms,1)
a_cell{n} = permute(a,possible_perms(n,:));
end
a_cell = a_cell';
b = zeros(10000);
clc
a = a_cell{1};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc
a = a_cell{2};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc
a = a_cell{3};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc
a = a_cell{4};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc
a = a_cell{5};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc
a = a_cell{6};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc
  5 Comments
tiwwexx
tiwwexx on 29 Nov 2023
The reshape is a good idea for this particular example. But the sum was just to demontrait the phenominon of how matrix ordering can drastically change the time to run the exact same calculations and in general I need the spatial information from the 2d array I get from a(:,:,n).
Bruno Luong
Bruno Luong on 29 Nov 2023
Edited: Bruno Luong on 29 Nov 2023
I guess it is NOT the sum, but the indexing a(n,:,:) or a(:,n,:) or a(:,:,n) that takes extra time.
The memory content of the original array a and such indexing is different (shuffled), so (I guess) MATLAB have to copy and rearrange the memory content BEFORE calling lowlevel sum to perform the iterative plus operations.
This comparison is NOT the same as if you compare purely caching as you put it in the question. What your code does is indexing AND calling the function. This is NOT nested for-loop access to array to retreive iteratively scalars.

Sign in to comment.

Accepted Answer

Bruno Luong
Bruno Luong on 30 Nov 2023
Edited: Bruno Luong on 30 Nov 2023
When you index a hybrid slide (j positive integer scalar)
% x = ...
% j = ...
xidx = x(:,:,:,j,:,:)
(here I take a 6D array as example), MATLAB probably does the following
s = size(x)
p = prod(s(1:3)) % length of the head dimensions
q = prod(s(5:6)) % length of the tail dimensions
xx = reshape(x, [p s(4) q]);
xidx = zeros(p, 1, q); % aloocate array that stores the result
for k = 1:q
% xidx(:,1,k) = xx(:,j,k);
memcpy(xidx(:,1,k), xx(:,j,k), p*sizeof(class(x))); % This is lowlevel fast memory copying, google it if you know to learn more
% copy a conyguous memory block of p numbers
end
s(4) = 1;
xidx = reshape(xidx, s);
Note that p*q = numel(xidx) = numel(x)/size(x,4), so constant under array permutation.
This is fastest when the array x is setup so that p is largest and q is smallest (including 1 value, i.e. j is at the last dimension of x).
In your case it's the for-loop with a(:,:,n) that would be the fastest.
  1 Comment
Bruno Luong
Bruno Luong on 30 Nov 2023
Edited: Bruno Luong on 30 Nov 2023
The above rule seems to be applicable here (I assume the differences of times of sum in various loops are negligible (I also modified the array size to make the effect more pronounced):
test
Elapsed time is 1.280635 seconds. Elapsed time is 2.023572 seconds. Elapsed time is 1.274053 seconds. Elapsed time is 2.023595 seconds. Elapsed time is 13.048555 seconds. Elapsed time is 13.214533 seconds. Elapsed time is 0.519139 seconds.
function test
a = rand(10000,100,100); % Changed by Bruno, was rand(10000,1000,10), 10 is too small to see the effect
possible_perms = perms([1 2 3]);
for n=1:size(possible_perms,1)
a_cell{n} = permute(a,possible_perms(n,:));
end
a_cell = a_cell';
b = zeros(1,10000); % Bruno fix the allocation, vector NOT array
clc
a = a_cell{1};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc % Elapsed time is 0.904677 seconds.
a = a_cell{2};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc % Elapsed time is 1.212156 seconds..
a = a_cell{3};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,:,n),'all');
end
end
toc % Elapsed time is 0.871692 seconds.
a = a_cell{4};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(:,n,:),'all');
end
end
toc % Elapsed time is 1.132273 seconds.
a = a_cell{5};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc % Elapsed time is 6.790436 seconds.
a = a_cell{6};
tic
for k=1:10
for n=1:10000
b(n) = sum(a(n,:,:),'all');
end
end
toc % Elapsed time is 6.875394 seconds.
% Bruno method
a = a_cell{3};
tic
a = reshape(a, [], size(a,3));
for k=1:10
b = sum(a,1);
end
toc % Elapsed time is 0.145326 seconds.
end

Sign in to comment.

More Answers (0)

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!