For-loops are now faster than some of the simplest vectorized statements. Why do we still need so much deep copying?

Question

Matt J on 11 Nov 2021

4
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1584289-for-loops-are-now-faster-than-some-of-the-simplest-vectorized-statements-why-do-we-still-need-so-mu

Commented: Walter Roberson on 20 Nov 2021

It is disconcerting to me that the for-loop implementation of operations like x(i+1)*x(i) are now faster than the vectorized implementation (see below). One of them main purposes of Matlab is to make it possible to do such operations optimally in brief, vectorized syntax.

Moreover, this problem is entirely because subsref operations like x(2:end) create a deep copy of the data. Can't the parser determine that this is unnecessary in situations where the indexed variable only appears on the right-hand side of the assignment? And if so, isn't it high time this was implemented?

N=1e8;
x=rand(1,N);
y=zeros(1,N);
%Vectorized implementation
tic;
 y(1:N-1)=x(2:N).*x(1:N-1);
toc
Elapsed time is 1.574006 seconds.
%For-loop implementation
tic;
    for i=1:N-1
     y(i)=x(i+1).*x(i);
    end
toc
Elapsed time is 0.576191 seconds.
%Remove contribution of deep copies
X1=x(2:end); X2=x(1:end-1);
tic;
 y(1:N-1)=X1.*X2; 
toc
Elapsed time is 0.441418 seconds.

1 Comment
Show -1 older commentsHide -1 older comments

Jan on 12 Nov 2021

Edited: Jan on 13 Nov 2021

By the way: Timings in R2009a to R2015b:

% Elapsed time is 1.998664 seconds.   Vectorized with copy
% Elapsed time is 4.307474 seconds.   Loop
% Elapsed time is 0.798561 seconds.   Vectorized without copy

R2016b:

% Elapsed time is 1.142006 seconds.
% Elapsed time is 0.733920 seconds.
% Elapsed time is 0.633730 seconds.

R2018b:

% Elapsed time is 0.856713 seconds.
% Elapsed time is 0.500901 seconds.
% Elapsed time is 0.427571 seconds.

Win 10, i7, 16 GB RAM.

Even the vectorized version needs less than the half time only.

Sign in to comment.

Sign in to answer this question.

Answer 1

James Tursa on 11 Nov 2021

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1584289-for-loops-are-now-faster-than-some-of-the-simplest-vectorized-statements-why-do-we-still-need-so-mu#answer_829514

Edited: James Tursa on 11 Nov 2021

I have read rumors in the past that TMW was tinkering with the idea of sub-array references, or some type of limited pointer capability. But nothing officially yet, of course. The entire endeavor becomes problematic in a dynamic variable scheme like MATLAB because the underlying memory can be pulled out from under you at any time. I.e., the source array can change and free the memory that the sub-array reference depends on. To keep the sub-array valid you have to save a shared copy of the source off to the side and keep track of it's connection to the sub-arrays. Only when all sub-arrays connected to the source array are released can you finally free the background shared source array you saved off to the side. It's a bookkeeping headache, but that is in fact what I do with my SHAREDCHILD mex submission:

https://www.mathworks.com/matlabcentral/fileexchange/65842-sharedchild-creates-a-shared-data-copy-of-contiguous-subset

The problem with this submission, however, is TMW keeps changing the underlying mxArray header to thwart my efforrts. It used to be that the linked list of all shared data copies was visible in the header. Not any more. So SHAREDCHILD is now broken for the latest versions of MATLAB, and without that linked list I don't know if I can repair it. Maybe I can get by with just the counter that is now visible, but I haven't had the time to work on this yet.

"Isn't it high time this was implemented?"

My obious answer is yes, of course. These temporary deep copies of large variables have long been a source of memory/timing issues in MATLAB.

15 Comments
Show 13 older commentsHide 13 older comments

Walter Roberson on 11 Nov 2021

Consider

 y(1:N-1)=x(2:N).*f(x(1:N-1)).*x(1:N-1);
 
function g = f(X)
  g = ones(size(X));
end

This is mathematically equivalent to x(2:N).*x(1:N-1), and you would be able to share the x(1:N-1) inside the call to f() with the x(1:N-1) at the end... in this case.

But suppose

function g = f(X)
   g = ones(size(X));
   assignin('caller', 'x', X.^2);
end

and now the x(1:N-1) inside the call to f() is not the same x(1:N-1) as after the call to f.

So how would an optimizer know whether it can share the sub-arrays or not?

One possibility would be to say that the optimization has to be turned off if the expression makes a call to a user-defined function between the potentially shared references. But is that good enough? No, because:

eval() is not (typically) a user defined function, but clearly could be used to mess up variables
input() without a 's' option, or str2num() are not (typically) user-defined but clearly could be used to mess up variables
the MATLAB model is that any .m file inside the Mathworks-provided implementation could potentially be editted (the edit might or might not effectively take effect before the next time the cache was explicitly rehashed), so the fact that a file is inside the Mathworks-provided directories does not promise that the file is safe
if x were a handle object, then the subsref methods defined for it might change the value of x

So it would have to be more like if "any" function (or method) was called between the two subs-references then the optimization would have to be turned off.

But... formally speaking, indexing itself is a function. And arithmetic is a function. I think it would be disappointing if y(1:N-1)=x(2:N).*1.*x(1:N-1); or y(1:N-1)=x(2:N) + 0 + x(1:N-1); had to have sharing disabled when x(2:N).*x(1:N-1) or x(2:N)+x(1:N-1) did not have to have sharing disabled (in the shorter version, the .* or + function would not have executed before the hypothetical referencing was done, being times(x(2:N), x(1:N-1)) and plus(x(2:N), x(1:N-1)) respectively -- hypothetically sharing might take place in preparing arguments to the same function call, under some circumstances.)

So... it is almost as if for such an optimization to be possible, MATLAB would start having to have a registry of which functions are unconditionally safe from side effects of assigning to 'caller' variables -- and if you are going to do that you would probably like to also know about potential side effect of assigning to 'base' or global variables.

x = input('enter an integer between 1 and 9: ')

A function that had that would have to be marked as unsafe, because the user might respond with an evalin() call.

x = input('enter an integer between 1 and 9: ', 's')

but that does not need to be marked as unsafe

opt = 's';
x = input('enter an integer between 1 and 9: ', opt)

More risky. Analysis could potentially prove that 's' will be passed in and thus that the call is safe

opt = "s";
x = input('enter an integer between 1 and 9: ', char(opt))

As humans we can prove that is safe, provided that char has not been overloaded... but remember that the user might have cd'd into a directory with its own private/char.m so we can't unconditionally mark this as safe...

For that matter, the safety of the great majority of .m or .p or .slx functions can be destroyed if the user cd's into a directory that contains functions arranged to shadow normal functions. The particular case of char() of a string() object is precedence 7 ( https://www.mathworks.com/help/matlab/matlab_prog/function-precedence-order.html ), object function, and functions in the current directory are only precedence 10, so this particular case isn't going to happen just because the user has a char.m file in a directory. But private functions are precedence 6 and can be used to override object functions...

So, what would be needed to make the optimization effective, with correct detection of when it had to be disabled? It seems to me that it would possibly take an effort something like tracking all function names that, if shadowed, would have to revoke the optimization. And for any given function, the list for it would have to be constructed as the union of the lists for any function potentially called by the function that might have its decisions overridden... And rebuild the list each time the function is editted... and rebuild the list for each function that calls the now-editted function...

Sounds like a lot of work. It is not clear to me that it is worth the effort...

So if such an optimization were added, it might need need to be restricted to the case of being able to share between two inputs to the same function, under the circumstance that no other functions are being invoked, and the entity being indexed is one of the built-in types...

Walter Roberson on 12 Nov 2021

assignin() and evalin() already have plenty of hazards that Matlab does not even try to protect us against.

I would disagree on that. I think the above example of modification-in-place shows that MATLAB does attempt to protect us from problems caused by assignin() and evalin() by using strict evaluation semantics.

Mathworks has modified the evaluation semantics with regards to retaining the right to not notice that a name has switched between function and variable. That is, though, a violation of the defined semantics, and I have a hard time finding a theoretical mental model that justifies it for that case without restricting other kinds of assignment. It is an interesting practical optimization, but practical is not the same as being consistent.

I think it might be reasonable for Mathworks to redefine the predence model to restrict side effects from assignin() and evalin(). It might even be reasonable for Mathworks to make every function that has a matching "end" statement into being a static function (as I had long thought was the case.)

At the moment, the major pratical problem that I can think of in making all functions with matching end into static functions, is that it would make it more difficult to use syms . syms used in command form is pretty common, and it would be a practical nuisance to have to rewrite all those syms in terms of sym and assignment statements -- especially since there are some constructs supported by syms that do not have direct correspondance with sym -- syms is not just sym with no output expected.

Potentially making all functions with matching end into static workspaces could also affect programs such as cvx that use key terms to effectively extend the language; I do not use those myself, so I do not know how much of a problem that might add up to.

But on the other hand, just making workspaces static is not enough to effectively rule out evalin() or assignin(), as assignment through those means is permitted for variables that already exist. In my example above, if testbash had been a static function, the assignment to 'x' inside bash() would still have been permitted, evan if the assignment to 'z' was not valid under those circumstances.

So, we need some other concept other than just "static" to convey "don't allow writing to here", in service of the goal of permitting optimizations like subrange pointers.

Matt J on 19 Nov 2021

I have read rumors in the past that TMW was tinkering with the idea of sub-array references, or some type of limited pointer capability. But nothing officially yet, of course.

Maybe this counts as something official? From tech-support:

Sorry for keeping you waiting. Here's the updates from our investigation: Our development team is aware of the potential for this optimization and have it on our roadmap and we will be working on it to make it available in the future.

Thank you for bringing this issue to our attention, we will use this example as an input for our design. I am going to mark this case as closed now from a technical support perspective and you will be notified once the feature is implemented in future releases. If you have a new technical support question, please submit a new request here:

http://www.mathworks.com/support/servicerequests/create.html

Sincerely,

Jijie Zhou

MathWorks Technical Support Department

Walter Roberson on 20 Nov 2021

I remember encountering sub-references in the code or documentation somewhere, but I do not remember where :(

Sign in to comment.

Answer 2

Jan on 13 Nov 2021

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1584289-for-loops-are-now-faster-than-some-of-the-simplest-vectorized-statements-why-do-we-still-need-so-mu#answer_830559

I have heared the moaning in the net, that Matlab processes loops very slow. Then MathWorks has introduced the JIT acceleration in R6.5 (2002) and loops are processed with a fair speed.

Between R2015b and R2016b the speed of a simple loop using double was improved massively. Nice.

Now the deep data copies are the new bottleneck. The discussion with Walter mentions some serious problems with fixing this. We see, that MathWorks has restricted the possibility to share variables between nested functions and scripts in the past. A possible explanation is, that they go in the direction you are dreaming of.

I suggest to call one of the designers and ask for the future plans. MathWorks does not publish the strategies for the future. But if you ask personally in a private phone call...

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

For-loops are now faster than some of the simplest vectorized statements. Why do we still need so much deep copying?

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

15 Comments
Show 13 older commentsHide 13 older comments

More Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

For-loops are now faster than some of the simplest vectorized statements. Why do we still need so much deep copying?

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

15 Comments Show 13 older commentsHide 13 older comments

More Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

15 Comments
Show 13 older commentsHide 13 older comments

0 Comments
Show -2 older commentsHide -2 older comments