How to quickly do Cholesky factorization for many small matrices?

Question

1 vote

In my project, I have to check many small matrices for positive-semidefiniteness (PSDness). I use chol() since it is much faster than using eig() to check for negative eigenvalues. Still it is very slow and the most time consuming part of my calculations, since I have to check 10e6 32x32 matrices over and over again. I already tried to use parfor and batch functions but it doesn't work (I also read this in the forums). However, it occured to me that the processor utilization is very low, so I checked what happens when I run the code on two instances of matlab on the same computer. The processor utilization doubled and each script finished in the same time it would have taken if they had been running alone. So it is possible to let chol() run in parallel on the same system. Any ideas how to do this in the same instance of matlab without incurring the problems you get with chol() in a parfor loop?

12 Comments
Show 10 older comments Hide 10 older comments

Stephan Orzada on 23 Jul 2021

Open in MATLAB Online

I've already tried to make my own implementation of the Cholesky Decomposition which works well on my V100 GPU. It is extremely fast but only for matrices up to 16x16. With larger matrices, it seems to be slower than chol() in a for loop. Note that "matrices" already is a GPU-array.

function [isPSD_out]=check_PSD_chol(matrices)
%This function checks whether a large bunch of complex matrices is positive
%semi-definite, using the Cholesky facorization.
%For a single matrix, this is much slower than the matlab-implementation
%"chol"
%This function is intended to do the calculation for MANY small matrices.
%The input variable "matrices" is a (x,y,N)-array with N matrices to test.
%The output is an Nx1 logical array with "1" denoting that the
%corresponding matrix is PSD.
%This function is intended to run on a GPU. Taking the "gpuArray" and "gather" stuff
%away makes it CPU ready, but be aware that this might be slower than
%running matlabs "chol" in a simple for-loop.
[~,y,N]=size(matrices);
isPSD=gpuArray(true(N,1)); %at the start assume all matrices are PSD. Only matrices which are PSD are calculated
Summe=gpuArray(zeros(1,1,N));
for i=1:y
    for j=1:i
        Summe(1,1,isPSD)=matrices(i,j,isPSD);
        Summe(1,1,isPSD)=Summe(1,1,isPSD) - sum(matrices(i,1:j-1,isPSD).*conj(matrices(j,1:j-1,isPSD)),2);
        if i>j
            matrices(i,j,isPSD)=Summe(1,1,isPSD)./(matrices(j,j,isPSD));
        else
        isPSD=isPSD & (squeeze((Summe)) > 0); %Here we check whether it is PSD. If "summe" is negative, it is not. We don't have to do any further calculations on such matrices.
        matrices(i,i,isPSD)=sqrt((Summe(isPSD)));
        end
    end
end
isPSD_out=gather(isPSD);
end

Anyway, I wouldn't get the 32x32x10e6 matrices into the memory, and the datatransfer necessary in that case diminishes the gain from using the GPU.

Stephan Orzada on 30 Jul 2021

Edited: Stephan Orzada on 30 Jul 2021

Open in MATLAB Online

I only need to know whether each matrix is positive semidefinite. I do not need the cholesky factors.

It is about testing whether

where S and D are NxN matrices.

I know that eig and chol do not agree in edge cases. This is not important for my calculations. They are used for compression algorithms we have developed for compressing specfic absorption rate (SAR) matrices for magnetic resonance imaging (MRI). (Orzada et al. 1, Orzada et al. 2). The effect of this difference is negligible in our case.

Until now I had only extensively worked with 8x8 and 16x16 matrices. Now we need to do 32x32 and the increase in calculation time is unbearable. It is not only that the matrices have 4 times more elements, we also have more matrices and need to compress further.

I've reworked my GPU code to work with lower triangles only:

function [isPSD_out]=check_PSD_chol_tri(matrices,y)
%This function checks whether a large bunch of matrices is positive
%semi-definite, using the Cholesky facorization.
%For a single matrix, this is much slower than the matlab-implementation
%"chol"
%This function is intended to do the calculation for MANY small matrices.
%The input variable "matrices" is a (x,N)-array with N matrices to test.
%Only the lower triangles are saved as a vector, these vectors are
%concatenated to for the "matrices" array.
%The output is an Nx1 logical array with "1" denoting that the
%corresponding matrix is PSD.
%This function is intended to run on a GPU. Taking the "gpuArray" and "gather" stuff
%away makes it CPU ready, but be aware that this might be slower than
%running matlabs "chol" in a simple for-loop.
[~,N]=size(matrices);
isPSD=true(N,1,'gpuArray'); %at the start assume all matrices are PSD. Only matrices which are PSD are calculated
for i=1:y
    for j=1:i-1
        matrices(sq2tri(i,j),isPSD)=(matrices(sq2tri(i,j),isPSD) - sum(matrices(sq2tri(i,1):sq2tri(i,j-1),isPSD).*conj(matrices(sq2tri(j,1):sq2tri(j,j-1),isPSD)),1))./(matrices(sq2tri(j,j),isPSD));
    end
        matrices(sq2tri(i,i),isPSD)=matrices(sq2tri(i,i),isPSD) - sum(matrices(sq2tri(i,1):sq2tri(i,i-1),isPSD).*conj(matrices(sq2tri(i,1):sq2tri(i,i-1),isPSD)),1);
        isPSD=isPSD & (squeeze(real(matrices(sq2tri(i,i),:))) > 0).'; %Here we check whether it is PSD. If "summe" is negative, it is not. We don't have to do any further calculations on such matrices.
        matrices(sq2tri(i,i),isPSD)=sqrt(matrices(sq2tri(i,i),isPSD));
end
isPSD_out=gather(isPSD);
end
function pos=sq2tri(x,y)
% This function maps the coordinates x,y of a  lower triangle of a sqare matrix to a vector.
pos=(x-1)*x/2+y;
end

This is now up to 3 times faster than chol() in a for-loop, even including the time to transfer the data between GPU (GV100) and CPU (2x6 core) in several large chunks, since the whole set of matrices doesn't fit into the memory of the GPU. I still have the feeling, that this could be done much faster, looking at how fast several instances of matlab do the calculation with chol() on the two CPUs in parallel.

EDIT: Thanks for the idea with checking the main diagonal. Most of the matrices are expected to not be PSD. I've tried it on a example set and could recognize ca. 20% of the non-PSD matrices that way. I still have to test it in the compression algorithm, when the differences become smaller and smaller.

Bruno Luong on 31 Jul 2021

Edited: Bruno Luong on 31 Jul 2021

In real life 0 eigen value does not exist exactly. CHOL, EIG, EIGS all might fail due to round off.

And Stephan Orzada end up programming his own Cholesky decomposition for his own need, not MATLAB CHOL.

Stephan Orzada on 1 Aug 2021

You are right. For my purpose, I don't care whether it is positive semidefinite or positive definite. Mathematically there is a difference, but it is negligible in many practical cases.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Bruno Luong on 23 Jul 2021

0 votes

It requires MEX, but it should be fast

https://www.mathworks.com/matlabcentral/fileexchange/37515-mmx

2 Comments
Show None Hide None

Stephan Orzada on 23 Jul 2021

Edited: Stephan Orzada on 23 Jul 2021

Thank you, but unfortunately this doesn't work, since chol in MMX ignores the imaginary part of the complex matrices, and anyway does not support the flag which I need for determining PSDness. I couldn't even tell from the results, because even if the matrix is not PSD (for which the Cholesky decomposition doesn't work) it produces a "result" without any error messages.

Bruno Luong on 24 Jul 2021

Edited: Bruno Luong on 24 Jul 2021

Indeed mmx only deal with real matrix.

But if you are willing to modify the code, you might change the function dpotrf in line 284 of mmx.cpp to zpotrf

Of course you have to take care of retriving MATLAB complex internal interleaved data.

You might ask the author if he can gives you a hand for such task.

Sign in to comment.

How to quickly do Cholesky factorization for many small matrices?

12 Comments
Show 10 older comments Hide 10 older comments

Answers (1)

2 Comments
Show None Hide None

Categories

Products

Release

Tags

Community Treasure Hunt

How to quickly do Cholesky factorization for many small matrices?

12 Comments Show 10 older comments Hide 10 older comments

Answers (1)

2 Comments Show None Hide None

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

12 Comments
Show 10 older comments Hide 10 older comments

2 Comments
Show None Hide None