Why does is take O(n^2) to append elements in matlab when it takes O(n) amortized time in theory?

6 views (last 30 days)
Brando Miranda on 6 Apr 2018
Commented: Walter Roberson on 14 May 2021
why doesn't matlab have O(n) amortized time to append elements to a growing array? I thought his was an algorithmically well understood problem. Isn't O(n^2) imply something is wrong with the implementation?

Walter Roberson on 6 Apr 2018
t(1) = 1;
Total items copied to memory this time: 1
Then
t(2) = 2;
[For the purposes of this discussion] MATLAB does not allocate any extra space when you did the t(1) = 1, so MATLAB cannot just grow the memory in-place. Instead it has to go find a block of memory large enough to find 2 elements. When it does that, it copies the existing t(1) into the first part of that new block, and then does the assignment of the 2 to the second element of that block. Then it reduces the reference count on the data block that was holding the 1, which would typically result in that data block being handed back to the list of available blocks.
Total items copied to memory this time: the 1 from the old block, plus the 1 newly copied in, so total 2 this round
Now do t(3) = 3. MATLAB has to go find a block of memory large enough for 3 entries, copy the existing 2 entries into it, make the new assignment, reduce the reference count on the block of 2 entries (which will probably result in the block of 2 being reclaimed.)
Total items copied to memory this time: the 2 from the old block, plus the 1 newly copied in, so total 3 this round
Now do t(4) = 4. Go find enough for 4 entries, copy the 3, assign the 1, disinterest the old block. 3 old copied, 1 new assignment, 4 total memory movements this round.
And so on.
The n'th expansion requires (n-1) copies of old data plus 1 copy of new data, for a total of n items that round.
Now, after the n'th expansion has been done, how much total moves has there been to get to there? 1 + 2 + 3 + 4 + ... n . Which is n*(n+1)/2, which is O(n^2) to have gotten to here from the beginning.
Earlier I said "for the purpose of this discussion" that MATLAB does not allocate any extra space initially. That is not actually true. Really, MATLAB uses two allocation strategies: it has a store of "small blocks" of fixed size, and it has a free list of variable sized blocks (possibly with holes.) If an expansion is being made to memory stored in a "small block" and the result of the expansion will still fit, and the block is not being shared, then the data is just copied into the appropriate place in the small block and the headers are updated to increase the size; but if adding the new data would overflow the fixed size then MATLAB moves onto the free list that has the O(n^2) behaviour.
Plausibly MATLAB uses a hybrid scheme, where each time it allocates a block on the chain, it permits the rest of the block to be filled up as the user writes new information, provided the block is not shared. If the block size holds N data elements, then this permits N elements in a row to be written, which is more efficient. But this transforms it into a n*(n/N) behaviour, which is still O(n^2), just with a smaller constant of proportionality, which O notation does not care about.
The newer releases of MATLAB have work done underneath to try to predict growth of arrays and automatically pre-allocate to a size as large as it can prove that it will need (provided the user doesn't interrupt and nothing does an assignin() to clobber it.)
Walter Roberson on 14 May 2021
As long as your data is represented as consecutive memory with the first entry of the second column being directly after the last entry of the first column, then in order to append new rows, you must move data around. Potentially you could move it "in-place" if your memory block was large enough (you start from the end and pull things towards the end by varying amounts), but the data still needs to be moved.
However, if you are appending new columns, then those would go after the end of all existing data, and so as long as there was space in the block, no copying would be needed.
See though what I wrote above about hybrid schemes: you have to look carefully to see if you might be seeing quadratic behaviour with a lower constant of proportionality.