Can I speed up this matrix multiplication?

Sorry for this stupid question. I know that the matrix multiplication has been highly optimized by Matlab, but I am still want to figure out that can I further speed up only for this special case?
I have no idea for this bottleneck in my code. The simplified code is uploaded in the following.
% X, Q are given symmetric matrix
[P, ~] = eig(X);
ans = P*(Q.*(P'*Y*P))*P'; % Y will change in each loop.
The X, Y, and Q are 1000x1000 matrices. In each loop, I have to compute four matrix multiplications,
But actually, P and Q are constant matrices in each loop, so I want to know what I could do to further speed up this code?
I have tried gpuArray, but it cannot be faster than normal Matlab * operation (maybe my GPU is not good enough..)
Any suggestion will be appreciated.

6 Comments

Please provide some usual inputs. What are the typical sizes of X and Y? It matters, if they are 10x10 or 1e4x1e4 matrices.
Thanks for the comment. X, Y is 1000x1000 matrix and X is positive definite.
Q is not sparse. Q is a positive (i.e., Q > 0) and symmetric matrix.
Is Q low rank, by any chance?
Thanks, it seems Q can be low rank..

Sign in to comment.

 Accepted Answer

Matt J
Matt J on 10 May 2021
Edited: Matt J on 10 May 2021
For example, the size of Q is 1000x1000, but its rank is 10.
If so, then we can decompose Q into a short sum where and are the non-zero eigenvalues and vectors respectively of Q. Then the computation becomes, with representing the Kronecker product and representing diag(v_i),
or
So, you can now pre-compute the matrices and reduce the number of matrix multiplications in each of the above terms to two. Also, you have a 16-core CPU, so each of the terms can be computed in parallel and summed using parfor.

10 Comments

Can you expand on that decomposition of Q? I tried it with a small example and am not getting the expected result:
A=[1 2 3;4 5 6];
A(end+1,:)=A(1,:)+A(2,:) % A is rank 2
A = 3×3
1 2 3 4 5 6 5 7 9
[V,D]=eig(A)
V = 3×3
0.2430 0.7483 0.4082 0.5536 -0.6570 -0.8165 0.7966 0.0913 0.4082
D = 3×3
15.3899 0 0 0 -0.3899 0 0 0 0.0000
AA=D(1,1)*V(:,1)*V(:,1)' + D(2,2)*V(:,2)*V(:,2)' % should be equal to A?
AA = 3×3
0.6905 2.2619 2.9524 2.2619 4.5476 6.8095 2.9524 6.8095 9.7619
@Paul A should be symmetric in this case. Thus, try A = A' * A, and you may get expected result.
@Matt J Sorry. I have tried this way, and it cannot work as our expected. The main reason is that the parfor cannot completely compute the matrix multiplications in parallel. And in my 16-core CPU desktop, the computation of using this method is much slower than directly computing four matrix multiplication.
I use the following code to test the efficiency.
A = rand(1000);
B = rand(1000);
C = rand(1000);
At = A'; Bt = B';
tic % time 0.041966s
A*(B*C*Bt)*At;
toc
n = 10;
tic % time 0.168020s
for i = 1:n
A*C*At;
end
toc
tic % time 0.383589s
parfor i = 1:n
A*C*At;
end
toc
If n=1000 in this code, the parfor is still slower than normal for:
n = 1000;
tic % time 15.318453s
for i = 1:n
A*C*At;
end
toc
tic % time 17.380200s
parfor i = 1:n
A*C*At;
end
toc
@Zenan Li The test isn't definitive. You need to see what happens when you make appropriate use of parallel.pool.Constant,
A = num2cell(rand(1000,1000,10),[1,2]);
B = A;
At = A;
C = rand(1000);
tic;
C*(C.*(C*C*C))*C;
toc
n = 10;
tic % time 0.474655s
for i = 1:n
A{i}*C*At{i};
end
toc
p=parallel.pool.Constant({A,C,At});
tic % time 0.320019s
parfor i = 1:n
p.Value{1}{i}*p.Value{2}*p.Value{3}{i};
end
toc
@Matt J Thank you. I used your test code, and feel very strange for my result.
Elapsed time is 0.041877 seconds.
Elapsed time is 0.139642 seconds.
Elapsed time is 0.256022 seconds.
I also found that even if I comment the line p=parallel.pool.Constant({A,C,At}); The execution time does not make any difference.
I also ran an example of parfor in matlab document.
clc; clear;
tic % Elapsed time is 16.497146 seconds.
n = 200;
A = 500;
a = zeros(1,n);
for i = 1:n
a(i) = max(abs(eig(rand(A))));
end
toc
tic % Elapsed time is 9.336770 seconds.
n = 200;
A = 500;
a = zeros(1,n);
parfor i = 1:n
a(i) = max(abs(eig(rand(A))));
end
toc
It seems parfor is more efficient than for in this case.
Zenan Li
Zenan Li on 11 May 2021
Edited: Zenan Li on 11 May 2021
Actually, parfor could not further speed up the seperate matrix-matrix multiplications except on a cluster (one can refer to this question). Thus this strategy cannot work as our expected (at least in MATLAB).
Thanks for pointing out that Q is symmetric. I missed that, even though it was clearly stated previously in this thread.
My problem is very similar this question. I need to compute:
ans=A*P*A', where A is sparse, P is symetric and number of zero elements in P are 0.
How can I speed up the calculation using paralell toolbox
Bruno Luong
Bruno Luong on 18 May 2021
Edited: Bruno Luong on 18 May 2021
MATLAB matruix multiplication is perhaps multi-threaded.
You won't be able to speed up with putting parrallel on the single processor (PC).

Sign in to comment.

More Answers (1)

Matt J
Matt J on 8 May 2021
Edited: Matt J on 8 May 2021
I have tried gpuArray, but it cannot be faster than normal Matlab * operation (maybe my GPU is not good enough..)
Depending on your GPU, and your precision requirements, it may be worth pre-casting the matrices to single precision. Some GPUs are not well-optimized for double float operations. For example, on my machine with the GeForce GTX 1050,
dtype={'double','single'};
for i=1:2
[P,Q,Y]=deal(rand(1000,dtype{i}));
timeit(@() P*(Q.*(P'*Y*P))*P')
[P,Q,Y]=deal(gpuArray.rand(1000,dtype{i}));
gputimeit(@() P*(Q.*(P'*Y*P))*P.')
end
I obtain,
ans = %double CPU
0.0607
ans = %double GPU
0.1678
ans = %single CPU
0.0261
ans = %single GPU
0.0065

13 Comments

Thank you for your answer. I have used this strategy.
However, this code is for an optimization problem and I need a high precision.
Thus the single precision computation with GPU acceleration can only support in the first few loops.
Is there any other way to further accelerate it when the computation is double-precision?
Get a different graphics card? Here's the same test when running on the GTX 1080 Ti. Obviously, this card is better optimized for double precision, and the speed-up for the double precision GPU case is significant:
ans =
0.0872
ans =
0.0321
ans =
0.0451
ans =
0.0012
Thus the single precision computation with GPU acceleration can only support in the first few loops.
Have you really tested that?
I know it. My graphic cards is also GTX 1080Ti, but in a desktop. So double precision computation by GPU is slower than by using CPU. I have ran the GPUbench to test it.
So double precision computation by GPU is slower than by using CPU
You must have an exceptionally powerful CPU... What are the times you see when you run my test?
Hi Matt, I am so sorry for the late reply. The times run your test are in the following.
ans = %double CPU
0.0251
ans = %double GPU
0.0334
ans = %single CPU
0.0205
ans = %single GPU
0.0015
Zenan Li
Zenan Li on 10 May 2021
Edited: Zenan Li on 10 May 2021
The CPU is too powerful (16 cores i7-10870H)... I really want to know is there exists a way to pre-computing some matrix such that I can compute fewer matrix multiplications in each loop. I also see your nice package KronProd, but may not be helpful for this case.
Matt J
Matt J on 10 May 2021
Edited: Matt J on 10 May 2021
No, I think you are doing the multiplication optimally. I might have said otherwise of Q were sparse.
Unfortunately, Q is not sparse. If the size of matrix is smaller? such as, X and Y are 100x100 matrices?
No. The problem is Q.
Thanks very much. Q is related to the proximal operator of logdet function, thus cannot be sparse. But I am a little curious about this, If Q is sparse, how can I further accelerate it?
Well, for example, if Q=diag([1,0,0,....0]), then the computation reduces to
p=P(:,1);
ans=(p.'*Y*p)*(p*p.')
which is much easier.
Thanks, and I find that although Q is not sparse but it is low rank. How could I do?
For example, the size of Q is 1000x1000, but its rank is 10.

Sign in to comment.

Categories

Asked:

on 7 May 2021

Edited:

on 18 May 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!