A single precision matrix multiplication problem

Question

jian on 18 May 2023

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1966979-a-single-precision-matrix-multiplication-problem

Edited: James Tursa on 25 May 2023

scripts:

a=[-0.1708981,-0.1415759,-0.0727073,-0.0554292;
   0.0000000,-0.1715775,-0.1147511,-0.1098115;
   0.0000000,0.0000000,-0.4013846,-0.3982548;
   0.0000000,0.0000000,0.0000000,-0.0430463];
a=single(a);
b=[16557086;2829575;3241423;7206131];
b=single(b);
y=a*b;
z=zeros(4,1);
z=single(z);
for ii=1:1:4
    for jj=1:1:4
        z(ii)=a(ii,jj)*b(jj)+z(ii);
    end
end
c=y-z;
plot(c)  %=>  It's not all 0

The algorithm says y and z should be the same,But c is not all 0,why is this?

Even multiplying individual rows and columns makes a difference.

9 Comments
Show 7 older commentsHide 7 older comments

Paul on 19 May 2023

I checked the Release notes from 2019a onwards and didn't see any mention of changes to mtimes, which is suprising if such a heavy-use function actually had a behavior change. I didn't check all of the bug fix pages, but I'd be surprised to find it there, at least relative to this issue.

James Tursa on 19 May 2023

Edited: James Tursa on 19 May 2023

"... Is it possible mtimes works differently (order of computations) on different hardware? ..."

Yes. The function mtimes( ) for full double and single matrices is performed in the background by a multi-threaded BLAS library. Anytime that library changes you should expect there might be small differences in the outputs. Different MATLAB versions might use different libraries, and different machines/hardware might use different libraries or call different routines in the libraries (e.g., symmetric vs generic), or maybe the same library is used but with a different number of threads available. All of this can affect order of computations. MATLAB used to get their BLAS & LAPACK libraries from a 3rd party. I haven't checked lately but I presume they still do. Bottom line is you shouldn't expect floating point calculations to necessarily match exactly when comparing different MATLAB environments.

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 18 May 2023

3
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1966979-a-single-precision-matrix-multiplication-problem#answer_1238709

Edited: John D'Errico on 18 May 2023

Open in MATLAB Online

Why are you confused? NEVER trust the least significant bits of a floating point number.

The numbers in a are singles. So the least significant bits in a are of magnitude

eps('single')
ans = single
    1.1921e-07

Now you add and subtract them, multiplying by numbers that are on the order of 1e7. Do you appreciate that when MATLAB does the computations you wrote, there is no presumption that it will add and subtract them in EXACTLY the same sequence as the explicit for loops you wrote? And that means again, NEVER TRUST THE LEAST SIGNIFICANT BITS OF A FLOATING POINT RESULT.

The difference is now on the order of 1, actually, 0.25. WHAT A SURPRISE! NOT.

Floating point arithmetic is NOT exact mathematically correct arithmetic. In fact, it is just an approximation to exact mathematically correct arithmetic. A very good approximation in most cases. (Slightly less good for singles than doubles, but the difference is irrelevant in this respect.) As an approximation, there are always tiny errors made. And then when you magnify those errors, you can find them, IF you look very closely.

So, just for kicks, suppose I asked you to compute the result

2/3 - 1/3 - 1/3

BUT, you must use 4 digit decimal arithmetic, first approximating each number as a 4 digit decimal number. You would now get:

0.6667 - 0.3333 - 0.3333 = 0.0001

So even though the answer is mathematically zero, when we use a finite number of digits in any form, the result need not be zero.

Does order make a difference?

x = single([0.1 0.2 0.3]);
x(3) - x(1) - x(2)
ans = single
    1.4901e-08
-(x(1) + x(2)) + x(3)
ans = single
    0

Again, NEVER TRUST THE LEAST SIGNIFICANT BITS OF A RESULT. In both cases, the answer is effectively zero, to within a tolerance of eps('single').

The answer is to learn to use tolerances. Learn to not trust those least significant bits. Never test for exact equality of floating point numbers, at least not until you know enough about them that you fully understand when you can.

9 Comments
Show 7 older commentsHide 7 older comments

Steven Lord on 18 May 2023

Open in MATLAB Online

The input values are the same for both methods,

This is true.

and they should be calculating it the same way,direct multiplication of matrices should be the same as multiplication and addition.

Where specifically in the documentation do we state that matrix multiplication performs the same operations in the same order as the naive (in the sense of the fourth definition in the Adjective part of the English definition on Wikitionary, "Intuitive; designed to follow the way ordinary people approach a problem.") for loop implementation?

If you can find a place in the documentation where we state that, please let me know so I can have that section of the documentation corrected.

One minor suggestion about the code you posted:

z=zeros(4,1);
z=single(z);

This will allocate a double precision array (for the first line) then immediately convert it to single precision and throw away the memory allocated for the double precision array. For a vector this small that's not a big deal, but if you want to create a larger single precision array a more efficient approach is to tell zeros that you want a single precision array from the start, so it allocate that single precision array without creating a double precision array first.

z2 = zeros(4, 1, 'single');
isequal(z, z2)
ans = logical
   1

John D'Errico on 19 May 2023

There is no reason to assume that different versions of MATLAB MUST use exactly the same BLAS version. Matrix multiplications are performed by the BLAS, which stands for Basic Linear Algebra Subroutines as I recall. MATLAB farms that out, to highly optimized code in the BLAS.

The end result is that subtle things can influence the EXACT order orf arithmetic operations, but as I have shown several times, the EXACT order of floating point adds and subtracts may cause tiny differences in the result. That is something you need to learn to not worry about. NEVER test for exact equality of floating point numbers, because the way they were generated can easily be influenced by things not under your control.

Again, remember that floating point arithmetic is only an approximation to real arithmetic. Think of it as a model. But as a model, it can suffer from tiny flaws. In the case of floating point arithmetic, the flaws are in the least significant bits. That is something you need to live with, unless you are willing to operate in infinite precision. And infinite precision will take too long to use for any computation.

Even in the case of high precision tools, like the HPF code I have posted on the file exchange, it is still floating point arithmetic. So there are still flaws in the least significant bits, but now the depth of those flaws drops down by many orders of magnitude. You cannot get around that.

Walter Roberson on 19 May 2023

Note by the way that the high performance libraries may be different depending on the processor line, such as Intel vs AMD.

Sign in to comment.

Answer 2

James Tursa on 19 May 2023

2
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1966979-a-single-precision-matrix-multiplication-problem#answer_1239864

Edited: James Tursa on 22 May 2023

Open in MATLAB Online

Somewhat related side comments:

In general, the BLAS/LAPACK libraries do not contain mixed type argument routines. E.g., there are no BLAS matrix multiply routines to multiply a single precision matrix with a double precision matrix. To accomplish this with BLAS routines one must first make a temporary copy of the single matrix as a double matrix and then call the appropriate BLAS routine. This of course will chew up extra time and memory. The only exceptions to this are some accumulation routines (e.g., dot product of single precision inputs but using a double precision accumulator). That being said, I don't know if MATLAB ever makes use of these extra precision accumulator routines that are available.

Related to that, there are also no mixed real/complex argument routines in the BLAS/LAPACK libraries. Either all arguments have to be real or all arguments have to be complex. And all of the complex variables are assumed to be interleaved memory model (this was true even back in the R2017b and earlier days).

MATLAB R2017b and prior used a separate complex memory model, meaning the real part of a variable was held in contiguous memory that was separate from the imaginary part which was also held in contiguous memory. For MATLAB R2018a and later this changed to an interleaved complex memory model, where the real part is interleaved with the imaginary part in memory. This now aligns better with the BLAS/LAPACK libraries and how complex variables are typically held in memory in other languages such as Fortran and C/C++, but has implications for how the BLAS/LAPACK routines can be used in the background. A couple of examples:

(Real Matrix) * (Complex Matrix)

R2017b and earlier: This can be accomplished with a series of Real BLAS routine calls on the individual real and imaginary parts. No need for temporary copies.

R2018a and later: This requires that a temporary copy of the real matrix be made as an interleaved complex matrix with imaginary part 0 so that the Complex BLAS routine can be used. Takes extra time and memory compared to R2017b and earlier, and also lots of unnecessary 0 multiplies will be done (which might give some NaN results that don't appear in R2017b if there are inf's involved).

inv(Complex Matrix)

R2017b and earlier: This involves calling complex LAPACK routines, so the separate real and imaginary parts of the variable must first be copied into an equivalent interleaved variable. Then the appropriate LAPACK routines are called. Then the interleaved result must be copied into a separate memory model variable. Takes extra time and memory compared to R2018b and later.

R2018b and later: This can be accomplished with direct complex LAPACK routine calls. No need for temporary copies.

Symmetrix Matrix Results:

A = rand(5000);
AT = A'; % Explicit transpose constructed in memory
tic; x = A' * A; toc % Symmetrix matrix multiply routine called, A' not explicitly formed
Elapsed time is 0.737080 seconds.
tic; y = AT * A; toc % Generic matrix multiply routine called using explicitly formed A'
Elapsed time is 1.300470 seconds.
isequal(x,y)
ans = logical
   1
isequal(x,x') % Strictly symmetric result
ans = logical
   1
isequal(y,y') % Might not be strictly symmetric result
ans = logical
   1

So, with this last example, you can influence which BLAS routines are called in the background by how you write your code. Care must be taken to maintain strict symmetry of results. In the A' * A example, MATLAB can recognize the symmetry of the inputs and call a symmetric routine in the background without explicitly forming the transpose A'. Takes less time and guaranteed results in strict symmetric result. In the AT * A example, the transpose A' is explicitly formed but MATLAB has no clue that AT is the exact transpose of A, so a generic matrix multiply routine is called which takes longer and doesn't guarantee a strict symmetric result. In that last case we happened to get strict symmetry, but this will highly depend on your MATLAB/BLAS version and can't be guaranteed.

7 Comments
Show 5 older commentsHide 5 older comments

James Tursa on 22 May 2023

Edited: James Tursa on 25 May 2023

Open in MATLAB Online

For this:

C = B' * A' * A * B

MATLAB will do this left to right, so ...

C = ((B' * A') * A) * B % all generic routines called

For this:

C = (A*B)' * (A*B) % all generic routines called?

Unless the parser is smarter than it used to be, the first A*B will be calculated, and the second A*B will separately be calculated. Since these are physically two different matrices, MATLAB does not recognize the symmetry and generic routines will be called for the middle multiply.

The correct way to do this calculation to guarantee MATLAB will recognize and use the symmetry is:

AB = A*B; % generic routine called
C = AB' * AB; % symmetric routine called, C guaranteed symmetric

This leaves AB in your workspace, of course, which is not the case with the temporary A*B copies above. That's the price you pay to make sure the symmetric routine is called. But it's a very small price given that you can always clear AB right afterwards if you don't want it hanging around.

Now a test of online MATLAB:

A = rand(5000);
B = rand(5000);
tic; AB = A*B; C = AB' * AB; toc
Elapsed time is 2.475560 seconds.
tic; D = (A*B)' * (A*B); toc
Elapsed time is 4.003737 seconds.
isequal(C,D)
ans = logical
   1
isequal(C',C)
ans = logical
   1
isequal(D',D)
ans = logical
   1

So based on the timings it looks like the online parser isn't yet smart enough to recognize the symmetry in the (A*B)' * (A*B) case, and generic matrix multiply routines are called in the background.

When I run this code on my desktop PC version R2021a I get these results:

Elapsed time is 2.058999 seconds.
Elapsed time is 3.581072 seconds.
ans =
logical
0
ans =
logical
1
ans =
logical
1

So C and D are both symmetric, but they are not equal and you get two different results with this MATLAB/BLAS version. Which should not be surprising given that different background routines are called. When you compare the difference:

max(abs((C(:)-D(:))./C(:)))
ans =
3.6221e-16

You can see that the relative difference is down in the least significant bits. Both answers are "correct" in that sense. But you can save execution time and temporary memory and guarantee symmetric results if you take care in how you write your code.

James Tursa on 23 May 2023

Edited: James Tursa on 23 May 2023

There is probably no way to answer this definitively. It will depend on the MATLAB parser and memory manager as to what exactly happens in the background for cases like this, and those rules may change between MATLAB versions. Knowing when the mxArray structure addresses and data pointers get reused vs reallocated is not something TMW publishes. For small sized matrices you might see subtle differences in timings depending on how you code things. But for large sized matrices you will definitely see timings dominated by unnecessary temporary memory copies and/or generic vs symmetric routines.

This does bring up a side discussion about so-called "inplace" operations. If you call a function from within another function using R = myfunction(R,etc.), and that function has a signature like R = myfunction(R,etc.), then MATLAB can sometimes do inplace operations on the variable R (writing into current data pointer) rather than reallocating new memory and freeing old memory.

James Tursa on 23 May 2023

Edited: James Tursa on 23 May 2023

You can look here to get an idea of what the routine signatures are for the BLAS: https://netlib.org/blas/

Note that the symmetric routines work with only the upper or lower triangle. If you want the full result you have to manually write extra code to copy one triangle into the other before returning the result back to MATLAB.

The Level 3 routines do the matrix multiplies, and all of the signatures (generic and symmetric) indicate that the multiplies involving A (and B) accumulate into a separate matrix C. Of course you could pass the pointer to A in the C spot, but my guess is that this is undefined behavior since the routines are likely allowed to accumulate into C in multiple steps. I.e., A would be overwritten in the middle of the matrix multiply algorithms, invalidating the result. It wouldn't be too hard to code up a mex routine to see what the MATLAB BLAS does in this situation.

James Tursa on 23 May 2023

Edited: James Tursa on 23 May 2023

Open in MATLAB Online

So I went ahead and did this test and got what I expected, an invalid result. With the correct mex code:

>> A = rand(5000);
>> B = rand(5000);
>> C = A*B;
>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,Cmex)
ans =
logical
1

With the incorrect inplace mex code:

>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,A) % comparing to A because that's where inplace result should be
ans =
  logical
   0
>> max(abs(A(:)-C(:)))
ans =
   1.1143e+27

And here is the code, calling with square matrices so that A and C are the same size for the inplace test:

// Does generic matrix multiply using BLAS dgemm( )
// C = A * B
/* Includes ----------------------------------------------------------- */
#include "mex.h"
#include "blas.h"
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    char TRANS = 'N';
    mwSignedIndex M, N, K, LDA, LDB, LDC;
    double *A, *B, *C;
    double ALPHA = 1.0, BETA = 0.0;
    
    if( nlhs > 1 )
        mexErrMsgTxt("Too many outputs.");
    if( nrhs != 2 || !mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ||
        mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ||
        mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ||
        mxGetNumberOfDimensions(prhs[0]) != 2 || mxGetNumberOfDimensions(prhs[1]) != 2 )
        mexErrMsgTxt("Need exactly two full double real 2D matrix inputs.");
    M = mxGetM(prhs[0]);
    K = mxGetN(prhs[0]);
    N = mxGetN(prhs[1]);
    LDA = M;
    LDB = mxGetM(prhs[1]);
    LDC = M;
    A = mxGetData(prhs[0]);
    B = mxGetData(prhs[1]);
    if( LDB != K )
        mexErrMsgTxt("Incorrect dimensions for matrix multiplication.");
    plhs[0] = mxCreateDoubleMatrix( M, N, mxREAL );
    C = mxGetData(plhs[0]);
//    dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, C, &LDC); // the correct code
    dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, A, &LDC); // the incorrect inplace code
}

Compiled using this helper function:

function compile_blas(varargin)
libdir = 'microsoft';
lib_blas = fullfile(matlabroot,'extern','lib',computer('arch'),libdir,'libmwblas.lib');
mex(varargin{:},lib_blas,'-largeArrayDims');
end

Sign in to comment.

A single precision matrix multiplication problem

9 Comments
Show 7 older commentsHide 7 older comments

Accepted Answer

9 Comments
Show 7 older commentsHide 7 older comments

More Answers (1)

7 Comments
Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

A single precision matrix multiplication problem

9 Comments Show 7 older commentsHide 7 older comments

Accepted Answer

9 Comments Show 7 older commentsHide 7 older comments

More Answers (1)

7 Comments Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

9 Comments
Show 7 older commentsHide 7 older comments

9 Comments
Show 7 older commentsHide 7 older comments

7 Comments
Show 5 older commentsHide 5 older comments