# In one dimensional cell array, what is faster: 'a{i} = ...' or 'a{i,1} = ...' ?

9 views (last 30 days)
Alon Rozen on 17 Nov 2016
Commented: Alon Rozen on 20 Nov 2016
Hi all,
I am using few large one dimensional cell arrays (might be few 10000 elements). Something like:
My_Cells = cell(30000,1);
I approach them in a loop few times. What, and why, will be a faster call to a certain cell in the cell array:
My_Cells{i} = 'whatever';
or:
My_Cells{i,1} = 'whatever';
Thanks,
Alon

KSSV on 17 Nov 2016
Edited: KSSV on 17 Nov 2016
N = 300;
t1 = zeros(N,1) ;
t2 = t1 ;
for j = 1:N
ac = cell(j,1) ;
t10 = tic ;
for i = 1:N
ac{i} = rand ;
end
t1(j) = toc(t10) ;
ac = cell(j,1) ;
t20 = tic ;
for i = 1:N
ac{i,1} = rand ;
end
t2(j) = toc(t20) ;
end
plot(t1,'r')
hold on
plot(t2,'b')
After running the above code, which plots time taken for each method, I think a{i,1} is better.

#### 1 Comment

tic toc is a bit of an unreliable way to measure speed. My first thought was that given the consistency of the results any vagueries of tic toc being inaccurate would be ironed out so the general conclusion still true.
But then I ran the profiler on your code instead and on my machine it claims 0.09-0.10s spent on the 90,000 calls to
ac{i} = rand ;
line, compared to 0.11s spent on the 90,000 calls to
ac{i,1} = rand ;
line.
That isn't conclusive as to the a{i} approach being faster, but does suggest that tic toc is not sufficiently accurate here. I don't have time to dig further in though, especially since the difference is negligible (one run of the profiler gave 0.09, the next run gave 0.10)

Guillaume on 17 Nov 2016
Edited: Guillaume on 17 Nov 2016
In theory, the linear indexing My_vector{i} should be faster since in the end matlab needs a single offset from the start of the array to figure out the memory position of the element (computer memory is linear, not 2d). In practice, it probably makes no difference, the optimiser probably sees that the 2nd index is constant and 1, so convert My_vector{i, 1} to My_vector{i}.
However, since the first version is less typing, makes it explicit that you're accessing a vector, and more importantly will work whether the vector is a row or column vector (whereas the other one will error on a row vector) and is therefore more flexible, I don't see the point in using My_vector{i, 1} regardless of speed.

Show 1 older comment
Guillaume on 17 Nov 2016
On my machine your code shows that a{i} is slightly faster than a{i, 1}. As I said the difference should be marginal. I suspect that other factors (such as processor caching) have more effect than the indexing method.
In any case, in my opinion, speed is completely the wrong focus anyway since the difference should always be marginal. Code clarity and less risks of bugs mean that a{i} should always be preferred, unless for some reason the code must only work with column vectors.
KSSV on 17 Nov 2016
Yes, the difference is marginal. calling by a{i} is legitimate.
Walter Roberson on 17 Nov 2016
The timing differences between runs in my tests vary by more than the differences between the two methods. It looks like possibly a{i} is slightly faster, but it is difficult to be sure.

Alon Rozen on 17 Nov 2016
Thank you all :)
You encouraged me to check it out using code (I was not familiar with the 'tic - toc' concept). The reason I wrote a somewhat different code is that in KSSV code the cell array is created inside the test loop. I am interested in the case where the cell array already exists.
Please check out the following code - it yields interesting result:
% Initialization:
clear all;
Num_Of_Runs = 300;
Num_Of_Cells = 1000;
Run_Time_1 = zeros(Num_Of_Runs,1);
Run_Time_2 = zeros(Num_Of_Runs,1);
My_Cells = cell(Num_Of_Cells,1);
% Run test:
for i = 1:Num_Of_Runs
T_1 = tic;
for j = 1:Num_Of_Cells
my_Cells(j) = rand;
end
Run_Time_1(i) = toc(T_1);
T_2 = tic;
for j = 1:Num_Of_Cells
my_Cells(j,1) = rand;
end
Run_Time_2(i) = toc(T_2);
end
% Time ratios:
M_T_Ratio = mean(Run_Time_1)/mean(Run_Time_2); % Mean time ratio
First_Ratio = Run_Time_1(1)/Run_Time_2; % First time ratio
% Plot results:
figure;
plot(Run_Time_1,'*r');
hold on;
plot(Run_Time_2,'*b');
grid on;
title(['Mean ratio: ', num2str(M_T_Rario), ' First ratio: ', num2str(First_Ratio)]);
I run this program with cell arrays of different sizes, from 100 to 100000, and the results are clear: My_Cells{i} is the faster way - time ratio is a hint above 0.6 - but the first run is slower(!) by a factor of approximately 1.5.
Interesting!

You have to be very careful with these kind of tests, though I do wholeheartedly agree with doing these where it is useful, hence this long-winded reply to a topic already mostly concluded! In your case your code where you use the
(j,1)
syntax is creating an increasingly sized 2d matrix because you got the indexing the wrong way round (a perfect example of what Guillaume was saying about not having to worry whether it is a row or column vector.
At the end of your first for loop you have my_Cells (which, incidentally are not cells, just numeric arrays where the runtime is even faster) being a 1x1000 vector.
You then run a for loop putting values in for each (j,1) which keeps resizing your matrix up to 1000x1000 at the end.
Change that line to
my_Cells(1,j) = rand;
and over the size of data tested the results are very similar.
Again running the profiler both the assignment lines take 0.02s (rounded) for 300,000 calls, so they are indistinguishable.
As discussed above, here speed is really not an issue worth concerning about, but this type of testing is something I like to do a lot in places that are more time sensitive so I certainly encourage it, but with the caveat that I mentioned at the top. Be very careful that you are comparing techniques under equal conditions, e.g.
• Don't use tic toc, it is inaccurate, use the profiler or the timeit function
• If you don't use timeit you must make sure yourself that the conditions are the same for the tests of both techniques - i.e. don't have one where the array is being resized in the loop and another where it isn't.
• Check that the results of both methods are equal - if they aren't the relative time is irrelevant.
I had to change a few things in your code above on a further glance:
You create My_Cells as a cell array, but don't use it. Instead you use my_cells which is a numeric array and because it is not pre-sized it grows within the first loop which is never efficient. That, in addition to the change mentioned above makes it a fairer test, though I wouldn't swear to it still that conditions are equal for both techniques as I have only run over it quickly.
Alon Rozen on 20 Nov 2016
Sorry Adam - in my code I mistakenly wrote '()' while in the real code I used for testing it was '{}' (I had to re-type the code here since I cannot pass it directly from my Matlab computer). So, it seems that the difference is indeed negligible in practice and the more intuitive use of My_Cell{i} is preferable for code simplicity. The small difference might origin from the difference between the computers.