How to accelerate multiple Backslash Operations?

Hi there,
I need to solve 500000 Linear Systems, so 500000 times A/b and I wonder whats the fastest way to do this as I have to do this quite a few times.
The Dimensions are A: 15*15*500000 and b: 15*1*500000, so rather small. A is dense.
is the fastest option really just a for-Loop like this one?:
for i=1:50000
C(:,i) = A(:,:,i) \ b(:,i)
end
What have I tried so far:
Because i have to run the calculations with different b's so its not such a big issue to calculate the Inverses (Ainv) of A once, I used the multiprod-Package, which vectorizes Matrix-Vector Multiplications, so
C = multiprod(Ainv, b)
is around 6xfaster than
for i=1:50000
C(:,i) = Ainv(:,:,i) * b(:,i)
end
I think the mmx Package works in a similar way.
But it is very often said that the BackslashOperator should be preferred over the calculation with the Inverse, so Im not very comfortable to use that solution.
Maybe multithreading is a solution, but I dont have any experience with that.
Does somebody know, whether it is also possible to vectorize the Backslash operator, or whether there is another way to speed things up?
I hope, I expressed myself properly and didnt foget or overlook something major :)
Any help would be much appreciated.
Julian

 Accepted Answer

If you have the Parallel Computing Toolbox, this is readily done on the GPU
C=pagefun(@mldivide,gpuArray(A), gpuArray(b));

5 Comments

Thank you, Ill try that out as well
Great. Please Accept-click the answer if it works for you.
Just using
C=pagefun(@mldivide,gpuArray(A), gpuArray(b));
in a loop is really slow (even slower than usual backslash operator in a loop) with my maschine. (16Gb Ram, AMD Fx 8350, Geforce GTX 750).
But does it even make sense to solve a lot of small mldivide operations in a loop on the gpu?
Because I thought using the Gpu is only faster, if you have big matrices and vectors, so it can actually parallize stuff.
Or is there any more to it, then to just put it in a loop?
Each individual gpu operation requires data transfert between CPU-host and GPU. If you have small stuffs the overhead transfert time will kill any advantage you can get from GPU.
@Jules, Why are you still using a loop? The whole job should have been done in just the single call to the pagefun command. On my GPU (GeForce GTX 1080 Ti), the whole calculation takes 0.1 sec.
gd=gpuDevice;
A=gpuArray.rand(15,15,500000);
b = gpuArray.rand(15,1,500000);
tic;
C=pagefun(@mldivide, A, b);
wait(gd);
toc;%Elapsed time is 0.095832 seconds.

Sign in to comment.

More Answers (1)

You might try this FEX

2 Comments

Thanks, Ill take a look at it
On the test on my PC it can reduce MATLAB for-loop time by a factor of 2 or 3 for 15x15 matrices
size(A) = [15 15 10000]
size(y) = [15 1 10000]
MultipleQRSolve time = 1.21478 [s]
Matlab loop time = 0.656033 [s]
SliceMultiSolver time = 0.218691 [s]
The test code is here
nA = 15;
mA = 15;
nY = 1;
nP = 10000;
szA = [nA,mA,nP];
szY = [nA,nY,nP];
A = randn(szA)+1i*randn(szA);
y = randn(szY)+1i*randn(szY);
tic
% https://fr.mathworks.com/matlabcentral/fileexchange/68976-multipleqr
x1 = MultipleQRSolve(A,y);
t1=toc;
tic
x2 = zeros(size(x1));
for k=1:nP
x2(:,:,k) = A(:,:,k)\y(:,:,k);
end
t2=toc;
tic
% https://fr.mathworks.com/matlabcentral/fileexchange/24260-multiple-same-size-linear-solver
x3 = SliceMultiSolver(A,y);
t3=toc;
fprintf('size(A) = %s\n', mat2str(size(A)));
fprintf('size(y) = %s\n', mat2str(size(y)));
fprintf('MultipleQRSolve time = %g [s]\n', t1);
fprintf('Matlab loop time = %g [s]\n', t2);
if exist('t3','var')
fprintf('SliceMultiSolver time = %g [s]\n', t3);
end

Sign in to comment.

Categories

Find more on Parallel Computing in Help Center and File Exchange

Products

Release

R2018a

Asked:

on 20 Jun 2019

Edited:

on 5 Jul 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!