How to quickly do Cholesky factorization for many small matrices?

In my project, I have to check many small matrices for positive-semidefiniteness (PSDness). I use chol() since it is much faster than using eig() to check for negative eigenvalues. Still it is very slow and the most time consuming part of my calculations, since I have to check 10e6 32x32 matrices over and over again. I already tried to use parfor and batch functions but it doesn't work (I also read this in the forums). However, it occured to me that the processor utilization is very low, so I checked what happens when I run the code on two instances of matlab on the same computer. The processor utilization doubled and each script finished in the same time it would have taken if they had been running alone. So it is possible to let chol() run in parallel on the same system. Any ideas how to do this in the same instance of matlab without incurring the problems you get with chol() in a parfor loop?

12 Comments

hi
if you are not too demanding and accept to move for instance to a QR decomposition instead of a Cholesky one, you may have two combined options to solve your question :
1 . seek for Householder QR algorithm proposed on Cleve Moler's blog and update it by using bsyfun and pagemtimes functions to process 32x32x1e6 tensors in a vectorized way....
2. make a variation on the theme to send computations to a GPU if available with enough GPU memory. This latter options may speed up the code by a factor greater than to 100 times.
good luck
Have you tried loop on
eigs(A, 1, 'sa')
EDIT: I tried it and it much solwer than chol or qr
I've already tried to make my own implementation of the Cholesky Decomposition which works well on my V100 GPU. It is extremely fast but only for matrices up to 16x16. With larger matrices, it seems to be slower than chol() in a for loop. Note that "matrices" already is a GPU-array.
function [isPSD_out]=check_PSD_chol(matrices)
%This function checks whether a large bunch of complex matrices is positive
%semi-definite, using the Cholesky facorization.
%For a single matrix, this is much slower than the matlab-implementation
%"chol"
%This function is intended to do the calculation for MANY small matrices.
%The input variable "matrices" is a (x,y,N)-array with N matrices to test.
%The output is an Nx1 logical array with "1" denoting that the
%corresponding matrix is PSD.
%This function is intended to run on a GPU. Taking the "gpuArray" and "gather" stuff
%away makes it CPU ready, but be aware that this might be slower than
%running matlabs "chol" in a simple for-loop.
[~,y,N]=size(matrices);
isPSD=gpuArray(true(N,1)); %at the start assume all matrices are PSD. Only matrices which are PSD are calculated
Summe=gpuArray(zeros(1,1,N));
for i=1:y
for j=1:i
Summe(1,1,isPSD)=matrices(i,j,isPSD);
Summe(1,1,isPSD)=Summe(1,1,isPSD) - sum(matrices(i,1:j-1,isPSD).*conj(matrices(j,1:j-1,isPSD)),2);
if i>j
matrices(i,j,isPSD)=Summe(1,1,isPSD)./(matrices(j,j,isPSD));
else
isPSD=isPSD & (squeeze((Summe)) > 0); %Here we check whether it is PSD. If "summe" is negative, it is not. We don't have to do any further calculations on such matrices.
matrices(i,i,isPSD)=sqrt((Summe(isPSD)));
end
end
end
isPSD_out=gather(isPSD);
end
Anyway, I wouldn't get the 32x32x10e6 matrices into the memory, and the datatransfer necessary in that case diminishes the gain from using the GPU.
a single comment to speed up your code :
gpuArray(true(N,1)) creates a logical vector on the cpu memory and then transfer it to the gpu unit....
you may directly use
true(N,1,'gpuarray') or gpuArray.true(N,1) which directly creates the vector on the gpu without any transfer loss...
the same applies to zeros(1,1,N)...
Something else to improve your code:
the if else statement within the two for loops is not efficient at all... and slows down computations
have you tried to vectorize the code in another way by changing your algorithm.... especially by using pagemtimes or bsxfun ? it could be helpful to eliminate one of the for loop ....
otherwise, you could switch to Bruno's Luong way using mex code
Thanks for the comments. You are right that creating the two arrays on the GPU is faster. I'll implement that, although it is negligible here.
How would you get rid of the if else statements? I know of no other algorithm to perform Cholesky Decomposition.
I've tried to use MMX, but unfortunately it doesn't work with complex matrices and it drops the "flag" for chol(). I am not interested in the result of the cholesky factorization, I only need to know whether it can be performed, so that I know which matrix is PSD.
Late comment, I'm just back from vacation: What are you doing based on knowing if chol succeeded on each matrix? That is, do you need to Cholesky factors for the next computation step? If not, I'd still be interested to know what the computation is - it's possible that chol and eig don't agree on whether a matrix is SPD in edge cases (that is, if there's an eigenvalue that's zero up to round-off error).
EDIT: If most of your matrices are expected to NOT be SPD, you can check this more efficiently by looking at the diagonal of each matrix - if any element is <= 0, it means the matrix cannot be SPD, so no need to call chol at all.
I only need to know whether each matrix is positive semidefinite. I do not need the cholesky factors.
It is about testing whether where S and D are NxN matrices.
I know that eig and chol do not agree in edge cases. This is not important for my calculations. They are used for compression algorithms we have developed for compressing specfic absorption rate (SAR) matrices for magnetic resonance imaging (MRI). (Orzada et al. 1, Orzada et al. 2). The effect of this difference is negligible in our case.
Until now I had only extensively worked with 8x8 and 16x16 matrices. Now we need to do 32x32 and the increase in calculation time is unbearable. It is not only that the matrices have 4 times more elements, we also have more matrices and need to compress further.
I've reworked my GPU code to work with lower triangles only:
function [isPSD_out]=check_PSD_chol_tri(matrices,y)
%This function checks whether a large bunch of matrices is positive
%semi-definite, using the Cholesky facorization.
%For a single matrix, this is much slower than the matlab-implementation
%"chol"
%This function is intended to do the calculation for MANY small matrices.
%The input variable "matrices" is a (x,N)-array with N matrices to test.
%Only the lower triangles are saved as a vector, these vectors are
%concatenated to for the "matrices" array.
%The output is an Nx1 logical array with "1" denoting that the
%corresponding matrix is PSD.
%This function is intended to run on a GPU. Taking the "gpuArray" and "gather" stuff
%away makes it CPU ready, but be aware that this might be slower than
%running matlabs "chol" in a simple for-loop.
[~,N]=size(matrices);
isPSD=true(N,1,'gpuArray'); %at the start assume all matrices are PSD. Only matrices which are PSD are calculated
for i=1:y
for j=1:i-1
matrices(sq2tri(i,j),isPSD)=(matrices(sq2tri(i,j),isPSD) - sum(matrices(sq2tri(i,1):sq2tri(i,j-1),isPSD).*conj(matrices(sq2tri(j,1):sq2tri(j,j-1),isPSD)),1))./(matrices(sq2tri(j,j),isPSD));
end
matrices(sq2tri(i,i),isPSD)=matrices(sq2tri(i,i),isPSD) - sum(matrices(sq2tri(i,1):sq2tri(i,i-1),isPSD).*conj(matrices(sq2tri(i,1):sq2tri(i,i-1),isPSD)),1);
isPSD=isPSD & (squeeze(real(matrices(sq2tri(i,i),:))) > 0).'; %Here we check whether it is PSD. If "summe" is negative, it is not. We don't have to do any further calculations on such matrices.
matrices(sq2tri(i,i),isPSD)=sqrt(matrices(sq2tri(i,i),isPSD));
end
isPSD_out=gather(isPSD);
end
function pos=sq2tri(x,y)
% This function maps the coordinates x,y of a lower triangle of a sqare matrix to a vector.
pos=(x-1)*x/2+y;
end
This is now up to 3 times faster than chol() in a for-loop, even including the time to transfer the data between GPU (GV100) and CPU (2x6 core) in several large chunks, since the whole set of matrices doesn't fit into the memory of the GPU. I still have the feeling, that this could be done much faster, looking at how fast several instances of matlab do the calculation with chol() on the two CPUs in parallel.
EDIT: Thanks for the idea with checking the main diagonal. Most of the matrices are expected to not be PSD. I've tried it on a example set and could recognize ca. 20% of the non-PSD matrices that way. I still have to test it in the compression algorithm, when the differences become smaller and smaller.
Can chol() be used to check for positive semi-definite? The doc page says the that the output flag only indicates positive definite (or not).
If the matrices are complex, what does it mean to check that the diagonal elements are <= 0 (or should that be <0)?
To Paul
PSD is equivalent to L'*L factorizable, this is trivial to see, since
x'*A*x is equal to y'*y = norm(y)^2 with y = L*x.
Though I also agree with Christine Tobler, numerically one must be careful using CHOL in liminting cases.
The cholesky factorization is defined for matrix that is fHermitian so the diagonal must be real. <=0 check makes sense.
I was referring to the OP's statements about using the chol() function to check for positive semi-definite. As I understand chol(), it doesn't factor a positive semi-definite matrix, so I don't understand how chol() would be used in this application.
M = diag([0 1 1]); % a positive semi-definite matrix that should be accepeted for this problem
[R,flag] = chol(M)
R = []
flag = 1
Good point (as usual) about the the diagonal of a Hermitian matrix. The OP didn't explicitly state that the input matrices are Hermitian, but I guess that's a good assumption.
In real life 0 eigen value does not exist exactly. CHOL, EIG, EIGS all might fail due to round off.
And Stephan Orzada end up programming his own Cholesky decomposition for his own need, not MATLAB CHOL.
You are right. For my purpose, I don't care whether it is positive semidefinite or positive definite. Mathematically there is a difference, but it is negligible in many practical cases.

Sign in to comment.

Answers (1)

2 Comments

Thank you, but unfortunately this doesn't work, since chol in MMX ignores the imaginary part of the complex matrices, and anyway does not support the flag which I need for determining PSDness. I couldn't even tell from the results, because even if the matrix is not PSD (for which the Cholesky decomposition doesn't work) it produces a "result" without any error messages.
Indeed mmx only deal with real matrix.
But if you are willing to modify the code, you might change the function dpotrf in line 284 of mmx.cpp to zpotrf
Of course you have to take care of retriving MATLAB complex internal interleaved data.
You might ask the author if he can gives you a hand for such task.

Sign in to comment.

Categories

Find more on Linear Algebra in Help Center and File Exchange

Products

Release

R2021a

Asked:

on 23 Jul 2021

Commented:

on 1 Aug 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!