# Matrix product and inversion speed up

10 views (last 30 days)
Angelo Cuzzola on 18 Jun 2020
Commented: Angelo Cuzzola on 19 Jun 2020
Hi,
my question is simple. I have a function where the most computational intesinve line is this simple matrix operation
iF = iR - iRZ*(iP - Z' *iRZ )\iRZ';
Dimesions of the involved objects vary at each call, but tipically are of the orders of magnitude listed here:
iR -> diagonal 40kx40k (saved as sparse)
iRZ -> 40k x 100 (saved as sparse)
Z -> 40k x 100
iP -> 100x100
Is there something one can do to speed up the code?
Thanks

Angelo Cuzzola on 19 Jun 2020
Well, but the proper equivalent of that set of operation is not the one you are proposing. Instead the two equivalent forms are:
iF = iR - (iRZ/(iP + Z_t'*iRZ))*iRZ';
iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
Here is a comparison (10k x 150, typical dimensions):
tic
iF = iR - (iRZ/(iP + Z_t'*iRZ))*iRZ';
toc
tic
iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
toc
Elapsed time is 1.107791 seconds.
Elapsed time is 1.081249 seconds.
Christine Tobler on 19 Jun 2020
What are you doing with the matrix iF in the next step? Since this is a dense 1e4x1e4 matrix, it will be comparatively expensive to compute, and just avoiding to store this explicitly altogether would probably be the fastest thing.
For example, if your next step is to do matrix-vector multiplication with iF, you could replace
y = iF*x
with
y = iR*y - iRZ*((iP - Z' *iRZ )\(iRZ'*y));
For some other possible operations, you will need to store the complete iF as a dense matrix - it depends on that next step.
Angelo Cuzzola on 19 Jun 2020
Well, this seems to quite improve the situation. Thanks
I have always been taught the opposite, namely to do expensive computations once and store the results. I couldn't imagine that the time consuming part was the storing part.
tic
iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
Au = A + PZ*(iF*V);
Pu = P - PZ*(iF*PZ');
toc
Elapsed time is 2.076231 seconds.
tic
PZRZ = PZ*iRZ;
Au = A + PZ*(iR*V) - PZRZ*((iP + Z_t'*iRZ)\iRZ'*V);
Pu = P - PZ*(iR*PZ') + PZRZ*((iP + Z_t'*iRZ)\iRZ'*PZ');
toc
Elapsed time is 0.861536 seconds.
tic
PZRZ = PZ*iRZ;
iPRZ = (iP + Z_t'*iRZ)\iRZ';
Au = A + PZ*(iR*V) - PZRZ*(iPRZ*V);
Pu = P - PZ*(iR*PZ') + PZRZ*(iPRZ*PZ');
toc
Elapsed time is 0.774753 seconds.

John D'Errico on 19 Jun 2020
Edited: John D'Errico on 19 Jun 2020
What you seem not to appreciate is that 90-95% is NOT very sparse, not when you are talking matrix factorizations (i.e., backslash.) Backslash does not compute an inverse, although the inverse of a general not terribly sparse matrix will also probably be almost full too.
Inside this expression:
iRZ*(iP - Z' *iRZ )\iRZ'
we see a 100x100 matrix in the parens. (Roughly, since you say the sizes changes somewhat arbitrarily.) That matrix is what controls a LOT, since it will likely be fairly non-sparse, given the components. But once that 100x100 matrix is decomposed using a matrix factorization, it likely becomes close to a full matrix, or a product of a full lower and upper triangular pair.
In that case, then if you multiply that result using a pre multiply by iRZ as well as an implicit multiply by iRZ', you also get something fairly full.
My question in the comments was to ask what is the sparsity of those submatrices. That is, compute
A = iP - Z' *iRZ ;
nnz(A)/numel(A)
You might also then try for that same A:
[L,U] = lu(A);
nnz(L)/numel(L)
nnz(U)/numel(U)
If each of those matrices is nearly 50% non-zero, then they are essentially full triangular matrices.
Next, on the parent computation:
A = iRZ*(iP - Z' *iRZ )\iRZ';
nnz(A)/numel(A)
Those are the things you need to do. You should not be surprised to see a great deal of fill-in. In fact, consider yourself lucky if that does not happen.
You might consider sparsity enhancing permutations, applied to the inner 100x100 matrix, thus tools like colamd, dmperm, etc. (Sorry, but it has been many years since I was actively using those tools. They can help a great deal when used properly.)
Remember that once things start to become too filled in, a sparse matrix computation can start to become LESS efficient than a full one would have been. You might test if making that inner square matrix a full matrix helps too.
Are these things important? Yes.
For example:
A = rand(100);
As = sparse(A);
B = sprand(10000,100,0.01);
So B is 99% sparse. A is essentially a full matrix. And remember that even if A is only fairly full, a factorization of A will probably be quite close to full anyway.
Consider the term
C = A\B';
nnz(C)/numel(C)
ans =
0.6299
It does not matter whether or not A is sparse, backslash created a matrix that was 63% non-zero. It is effectively full in the world of sparse matrices. In fact, if we look at what happens with
Cs = As\B';
>> whos C Cs
Name Size Bytes Class Attributes
C 100x10000 8000000 double
Cs 100x10000 10158408 double sparse
So that 63% non-zero matrix used roughly 25% MORE space to store than the comparable full version.
Now compare times:
timeit(@() B*(A\B'))
ans =
0.3445460668125
timeit(@() B*(As\B'))
ans =
1.5696587328125
Finally, consider this:
Bf = full(B);
timeit(@() Bf*(A\Bf'))
ans =
0.1513420138125
And those times were only for a 10000x100 problem, and a sparse version of B that was 99% sparse, not just 90-95%.
As long as I could store that final result in memory as a full matrix, I actually gained by just keeping the computation completely full. To be useful, sparse matrices need to be seriously sparse. As well, sparsity can disappear quickly.
So if that inner part is full, or even close to being full, then sparsity of B does not help. In fact using sparse matrices may be LESS efficient, using more storage in those products, and using more time. And you need to look at the sparsity of the computed results. At least consider fill-in reducing permutations, as it is fill-in that can be a killer for sparse matrices.

#### 1 Comment

Angelo Cuzzola on 19 Jun 2020
Cc = (iP + Z_t'*iRZ);
Cc_sparseness = nnz(Cc)/numel(Cc)
Cc_sparseness = 0.0198
[L U] = lu(Cc);
L_sparseness = nnz(L)/numel(L)
L_sparseness = 0.5033
U_sparseness = nnz(U)/numel(U)
U_sparseness = 0.5033
Sparseness seems to solve a bit (and I need it to save memory).
Non-sparse case:
tic
iRZ = iR*Z_t;
Cinv = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
toc
Elapsed time is 3.856927 seconds.
whos Z_t iRZ Cinv
Name Size Bytes Class Attributes
Z_t 10486x151 12667088 double
iRZ 10486x151 12667088 double
Cinv 10486x10486 879649568 double
Sparse case:
tic
Z_t = sparse(Z_t);
iRZ = sparse(iR*Z_t);
Cinv = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
toc
Elapsed time is 2.119886 seconds.
whos Z_t iRZ Cinv
Name Size Bytes Class Attributes
Z_t 10486x151 336768 double sparse
iRZ 10486x151 336768 double sparse
Cinv 10486x10486 879649568 double
I just hoped that, being the shape of iP + Z_t'*iRZ quite regular (see below), there could be a smart way to invert it.
1.0e+03 *
Columns 1 through 9
4.8288 0.0067 -0.0170 -0.0010 -0.0007 -0.0071 -0.0056 0.0119 0.0160
0.0067 0.0477 0 0 0 0 0 0 0
-0.0170 0 0.0768 0 0 0 0 0 0
-0.0010 0 0 0.0666 0 0 0 0 0
-0.0007 0 0 0 0.0732 0 0 0 0
-0.0071 0 0 0 0 0.0810 0 0 0
-0.0056 0 0 0 0 0 0.0405 0 0
0.0119 0 0 0 0 0 0 0.0765 0
0.0160 0 0 0 0 0 0 0 0.0693

Bjorn Gustavsson on 19 Jun 2020
My comment still seems relevant. A simple commandline test gives:
tic,AA = ones(3e4,2)*(ones(2,3e4)*ones(3e4,1));toc
% Elapsed time is 0.001027 seconds.
tic,BB = (ones(3e4,2)*ones(2,3e4))*ones(3e4,1);toc
% Elapsed time is 2.031760 seconds.
isequal(AA,BB)
% logical 1
If this is run inside a function where the JIT-optimization can work its wonders the multiplication might be a bit more cunning. But when you know the sizes ofthe matrices that should be multiplied together you can guide the order of the multiplications so that you avoid outer-products producing large intermediate matrices that then vanishes to produce a smaller matrix or vector in the end.
HTH

Angelo Cuzzola on 19 Jun 2020
Ok. But I do not see how this applies to my case, where there is an inversion. It is possible to invert the median object with the function inv() and then you can exploit your associativeness, but inv() is a lot more inefficient. Here is an example (10k x 150 dimesions)
tic
iF = iR - iRZ*(inv(iP + Z_t'*iRZ)*iRZ');
toc
Elapsed time is 17.203595 seconds.
tic
iF = iR - (iRZ*inv(iP + Z_t'*iRZ))*iRZ';
toc
Elapsed time is 7.220762 seconds.
tic
iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
toc
Elapsed time is 1.312007 seconds.
Bjorn Gustavsson on 19 Jun 2020
That's a downer, I hoped that inverting a 100x100 matrix wouldn't be that expensive. Perhaps you can have some use of Factorize.