A single precision matrix multiplication problem

8 views (last 30 days)
scripts:
a=[-0.1708981,-0.1415759,-0.0727073,-0.0554292;
0.0000000,-0.1715775,-0.1147511,-0.1098115;
0.0000000,0.0000000,-0.4013846,-0.3982548;
0.0000000,0.0000000,0.0000000,-0.0430463];
a=single(a);
b=[16557086;2829575;3241423;7206131];
b=single(b);
y=a*b;
z=zeros(4,1);
z=single(z);
for ii=1:1:4
for jj=1:1:4
z(ii)=a(ii,jj)*b(jj)+z(ii);
end
end
c=y-z;
plot(c) %=> It's not all 0
The algorithm says y and z should be the same,But c is not all 0,why is this?
Even multiplying individual rows and columns makes a difference.
  9 Comments
Paul
Paul on 19 May 2023
I checked the Release notes from 2019a onwards and didn't see any mention of changes to mtimes, which is suprising if such a heavy-use function actually had a behavior change. I didn't check all of the bug fix pages, but I'd be surprised to find it there, at least relative to this issue.
James Tursa
James Tursa on 19 May 2023
Edited: James Tursa on 19 May 2023
"... Is it possible mtimes works differently (order of computations) on different hardware? ..."
Yes. The function mtimes( ) for full double and single matrices is performed in the background by a multi-threaded BLAS library. Anytime that library changes you should expect there might be small differences in the outputs. Different MATLAB versions might use different libraries, and different machines/hardware might use different libraries or call different routines in the libraries (e.g., symmetric vs generic), or maybe the same library is used but with a different number of threads available. All of this can affect order of computations. MATLAB used to get their BLAS & LAPACK libraries from a 3rd party. I haven't checked lately but I presume they still do. Bottom line is you shouldn't expect floating point calculations to necessarily match exactly when comparing different MATLAB environments.

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 18 May 2023
Edited: John D'Errico on 18 May 2023
Why are you confused? NEVER trust the least significant bits of a floating point number.
The numbers in a are singles. So the least significant bits in a are of magnitude
eps('single')
ans = single 1.1921e-07
Now you add and subtract them, multiplying by numbers that are on the order of 1e7. Do you appreciate that when MATLAB does the computations you wrote, there is no presumption that it will add and subtract them in EXACTLY the same sequence as the explicit for loops you wrote? And that means again, NEVER TRUST THE LEAST SIGNIFICANT BITS OF A FLOATING POINT RESULT.
The difference is now on the order of 1, actually, 0.25. WHAT A SURPRISE! NOT.
Floating point arithmetic is NOT exact mathematically correct arithmetic. In fact, it is just an approximation to exact mathematically correct arithmetic. A very good approximation in most cases. (Slightly less good for singles than doubles, but the difference is irrelevant in this respect.) As an approximation, there are always tiny errors made. And then when you magnify those errors, you can find them, IF you look very closely.
So, just for kicks, suppose I asked you to compute the result
2/3 - 1/3 - 1/3
BUT, you must use 4 digit decimal arithmetic, first approximating each number as a 4 digit decimal number. You would now get:
0.6667 - 0.3333 - 0.3333 = 0.0001
So even though the answer is mathematically zero, when we use a finite number of digits in any form, the result need not be zero.
Does order make a difference?
x = single([0.1 0.2 0.3]);
x(3) - x(1) - x(2)
ans = single 1.4901e-08
-(x(1) + x(2)) + x(3)
ans = single 0
Again, NEVER TRUST THE LEAST SIGNIFICANT BITS OF A RESULT. In both cases, the answer is effectively zero, to within a tolerance of eps('single').
The answer is to learn to use tolerances. Learn to not trust those least significant bits. Never test for exact equality of floating point numbers, at least not until you know enough about them that you fully understand when you can.
  9 Comments
John D'Errico
John D'Errico on 19 May 2023
There is no reason to assume that different versions of MATLAB MUST use exactly the same BLAS version. Matrix multiplications are performed by the BLAS, which stands for Basic Linear Algebra Subroutines as I recall. MATLAB farms that out, to highly optimized code in the BLAS.
The end result is that subtle things can influence the EXACT order orf arithmetic operations, but as I have shown several times, the EXACT order of floating point adds and subtracts may cause tiny differences in the result. That is something you need to learn to not worry about. NEVER test for exact equality of floating point numbers, because the way they were generated can easily be influenced by things not under your control.
Again, remember that floating point arithmetic is only an approximation to real arithmetic. Think of it as a model. But as a model, it can suffer from tiny flaws. In the case of floating point arithmetic, the flaws are in the least significant bits. That is something you need to live with, unless you are willing to operate in infinite precision. And infinite precision will take too long to use for any computation.
Even in the case of high precision tools, like the HPF code I have posted on the file exchange, it is still floating point arithmetic. So there are still flaws in the least significant bits, but now the depth of those flaws drops down by many orders of magnitude. You cannot get around that.
Walter Roberson
Walter Roberson on 19 May 2023
Note by the way that the high performance libraries may be different depending on the processor line, such as Intel vs AMD.

Sign in to comment.

More Answers (1)

James Tursa
James Tursa on 19 May 2023
Edited: James Tursa on 22 May 2023
Somewhat related side comments:
In general, the BLAS/LAPACK libraries do not contain mixed type argument routines. E.g., there are no BLAS matrix multiply routines to multiply a single precision matrix with a double precision matrix. To accomplish this with BLAS routines one must first make a temporary copy of the single matrix as a double matrix and then call the appropriate BLAS routine. This of course will chew up extra time and memory. The only exceptions to this are some accumulation routines (e.g., dot product of single precision inputs but using a double precision accumulator). That being said, I don't know if MATLAB ever makes use of these extra precision accumulator routines that are available.
Related to that, there are also no mixed real/complex argument routines in the BLAS/LAPACK libraries. Either all arguments have to be real or all arguments have to be complex. And all of the complex variables are assumed to be interleaved memory model (this was true even back in the R2017b and earlier days).
MATLAB R2017b and prior used a separate complex memory model, meaning the real part of a variable was held in contiguous memory that was separate from the imaginary part which was also held in contiguous memory. For MATLAB R2018a and later this changed to an interleaved complex memory model, where the real part is interleaved with the imaginary part in memory. This now aligns better with the BLAS/LAPACK libraries and how complex variables are typically held in memory in other languages such as Fortran and C/C++, but has implications for how the BLAS/LAPACK routines can be used in the background. A couple of examples:
(Real Matrix) * (Complex Matrix)
R2017b and earlier: This can be accomplished with a series of Real BLAS routine calls on the individual real and imaginary parts. No need for temporary copies.
R2018a and later: This requires that a temporary copy of the real matrix be made as an interleaved complex matrix with imaginary part 0 so that the Complex BLAS routine can be used. Takes extra time and memory compared to R2017b and earlier, and also lots of unnecessary 0 multiplies will be done (which might give some NaN results that don't appear in R2017b if there are inf's involved).
inv(Complex Matrix)
R2017b and earlier: This involves calling complex LAPACK routines, so the separate real and imaginary parts of the variable must first be copied into an equivalent interleaved variable. Then the appropriate LAPACK routines are called. Then the interleaved result must be copied into a separate memory model variable. Takes extra time and memory compared to R2018b and later.
R2018b and later: This can be accomplished with direct complex LAPACK routine calls. No need for temporary copies.
Symmetrix Matrix Results:
A = rand(5000);
AT = A'; % Explicit transpose constructed in memory
tic; x = A' * A; toc % Symmetrix matrix multiply routine called, A' not explicitly formed
Elapsed time is 0.737080 seconds.
tic; y = AT * A; toc % Generic matrix multiply routine called using explicitly formed A'
Elapsed time is 1.300470 seconds.
isequal(x,y)
ans = logical
1
isequal(x,x') % Strictly symmetric result
ans = logical
1
isequal(y,y') % Might not be strictly symmetric result
ans = logical
1
So, with this last example, you can influence which BLAS routines are called in the background by how you write your code. Care must be taken to maintain strict symmetry of results. In the A' * A example, MATLAB can recognize the symmetry of the inputs and call a symmetric routine in the background without explicitly forming the transpose A'. Takes less time and guaranteed results in strict symmetric result. In the AT * A example, the transpose A' is explicitly formed but MATLAB has no clue that AT is the exact transpose of A, so a generic matrix multiply routine is called which takes longer and doesn't guarantee a strict symmetric result. In that last case we happened to get strict symmetry, but this will highly depend on your MATLAB/BLAS version and can't be guaranteed.
  7 Comments
James Tursa
James Tursa on 23 May 2023
Edited: James Tursa on 23 May 2023
You can look here to get an idea of what the routine signatures are for the BLAS: https://netlib.org/blas/
Note that the symmetric routines work with only the upper or lower triangle. If you want the full result you have to manually write extra code to copy one triangle into the other before returning the result back to MATLAB.
The Level 3 routines do the matrix multiplies, and all of the signatures (generic and symmetric) indicate that the multiplies involving A (and B) accumulate into a separate matrix C. Of course you could pass the pointer to A in the C spot, but my guess is that this is undefined behavior since the routines are likely allowed to accumulate into C in multiple steps. I.e., A would be overwritten in the middle of the matrix multiply algorithms, invalidating the result. It wouldn't be too hard to code up a mex routine to see what the MATLAB BLAS does in this situation.
James Tursa
James Tursa on 23 May 2023
Edited: James Tursa on 23 May 2023
So I went ahead and did this test and got what I expected, an invalid result. With the correct mex code:
>> A = rand(5000);
>> B = rand(5000);
>> C = A*B;
>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,Cmex)
ans =
logical
1
With the incorrect inplace mex code:
>> Cmex = dgemm_inplace_test(A,B);
>> isequal(C,A) % comparing to A because that's where inplace result should be
ans =
logical
0
>> max(abs(A(:)-C(:)))
ans =
1.1143e+27
And here is the code, calling with square matrices so that A and C are the same size for the inplace test:
// Does generic matrix multiply using BLAS dgemm( )
// C = A * B
/* Includes ----------------------------------------------------------- */
#include "mex.h"
#include "blas.h"
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
char TRANS = 'N';
mwSignedIndex M, N, K, LDA, LDB, LDC;
double *A, *B, *C;
double ALPHA = 1.0, BETA = 0.0;
if( nlhs > 1 )
mexErrMsgTxt("Too many outputs.");
if( nrhs != 2 || !mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ||
mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ||
mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ||
mxGetNumberOfDimensions(prhs[0]) != 2 || mxGetNumberOfDimensions(prhs[1]) != 2 )
mexErrMsgTxt("Need exactly two full double real 2D matrix inputs.");
M = mxGetM(prhs[0]);
K = mxGetN(prhs[0]);
N = mxGetN(prhs[1]);
LDA = M;
LDB = mxGetM(prhs[1]);
LDC = M;
A = mxGetData(prhs[0]);
B = mxGetData(prhs[1]);
if( LDB != K )
mexErrMsgTxt("Incorrect dimensions for matrix multiplication.");
plhs[0] = mxCreateDoubleMatrix( M, N, mxREAL );
C = mxGetData(plhs[0]);
// dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, C, &LDC); // the correct code
dgemm(&TRANS, &TRANS, &M, &N, &K, &ALPHA, A, &LDA, B, &LDB, &BETA, A, &LDC); // the incorrect inplace code
}
Compiled using this helper function:
function compile_blas(varargin)
libdir = 'microsoft';
lib_blas = fullfile(matlabroot,'extern','lib',computer('arch'),libdir,'libmwblas.lib');
mex(varargin{:},lib_blas,'-largeArrayDims');
end

Sign in to comment.

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!