Solving large linear systems of equations in parallel / distributed systems

What would be the best approach to solve on a distributed system a large systems of linear equations? The type of systems that could be translated into a large dense square symmetric matrix of say 60,000 * 60,000 elements to start with.

3 Comments

The best way? Don't do it at all.
The point being, almost always when someone thinks they need to invert a matrix, they never needed to do so in the first place. Many people are just following a formula that shows a matrix inverse. Sure, a nice way to write it. Authors love it, as it is easy to write. A^-1, GREAT. (And at least some of the time, the author of that text or paper did not understand themselves the subtleties of what they are telling you to do. Sorry, but true, at least some of the time. And editors don't have the skills in numerical linear algebra to know better. They may rely on referees and other sources for that. But referees are themselves not always knowlegable.)
But effectively that is usually just a way of solving a linear system of equations. So you probably need to work with some matrix factorizations to solve your problem, instead of actually ever computing the matrix inverse.
Of course, we don't really know why you think you need to do this. But the fact remains, if you don't know how to solve the problem, then almost always that is just a signal that you are probably trying to solve the wrong problem. What can we know though? Maybe you really, really do need to solve what you think you need to do. I can probably think of some circumstances where a matrix inverse truly is what I want, and even then there are better ways to solve it.
For example, suppose I wanted to form the diagonal elements of the inverse matrix: inv(X'*X). This is a common problem in statistics. But we can do that by first computing a QR factorization of X. We would probably need that anyway. But once we have the upper triangular matrix R, now we can get at those diagonal elements from a backsolve, applied to R. And since R is upper triangular, this part is efficient. And the nice thing about the above aproach is you need never compute (X'*X) in the first place, a terribly bad thing to do to a matrix.
Again I would STRONGLY suggest that what you need to do is understand why you think you need to do what you are doing, and whether it really is a good idea in the first place. Then spend some time talking to a mathematician who understands numerical linear algebra. Explain what you are doing, rather than just telling them you need to compute an inverse.
If you're solving a linear system of equations, in addition to not inverting the matrix you might be able to get away with not creating the matrix at all!
Most or all of the iterative solvers can accept either a coefficient matrix or a function handle that evaluates some type of matrix-vector product. If your matrix has some structure that you can exploit to compute those products without explicitly creating the matrix that can save you memory. See for example the "Using Function Handle Instead of Numeric Matrix" example on the gmres documentation page.
Thank you for your comments and yes @John D'Errico the very reason of my question is to ponder whether it is a good idea or not, albeit I admit my question was ill-formulated. And yes it comes down to solving a linear system of equations.
QR decomposition or actually SVD would be the "traditional approaches" we are "familiar" with but some basic search on solving large linear systems of equations have come up with approaches like Stochastic Gradient Descent.
@Steven Lord thank you for pointing out the gmres documentation page, I will have a look.

Sign in to comment.

 Accepted Answer

Gb = 60000^2*8/1024^3
Gb = 26.8221
The matrix is close to 27 gigabytes. The inverse is the same size, so you will need at least 54 gigabytes to store the matrix and its inverse.
Suppose you have 60000 cores (actually possible! see https://www.pcgamer.com/this-computer-is-26-inches-tall-and-houses-a-400000-core-processor/ ) and you have each one compute the minors "leave one out" style. Does that reduce the task to 60000 simultaneous computations on a 60000 x 1 matrix? NO, it reduces the task to 60000 simultaneous computations on 59999 x 59999 matrices.
To within round-off error, if you divide the work over N cores, each core still needs to work with an array just fractionally less than the original 27 gigabytes. Even supposing you could array to all use the same output array and even supposing that each core did not need an intermediate array (probably not true), if you had 128 gigabytes of total ram, you would only be able to split into at most 3 cores -- 27 * 3 slightly-different inputs + 27 output. In reality because intermediate arrays would be needed, you would probably not be able to do more than 2 cores in 128 gigabytes.
How large is your computer system? How much RAM? How many cores?

2 Comments

Hi Walter,
Thank you for your reply. From the size point of view we were already well aware this was a challenge not only in terms of the RAM but also in term of communication. Thankfully these days 256Gb is not beyond reach.
I do not know whether using a distributed system could potentially be of use in such a situation; it probably could be in some cases, such as if you have a block-diagonal system.
I would point out, though, that the current solvers used by \ call into high-performance libraries that are already multi-core . Some of the steps such as matrix multiplication or dot products can be broken up into pieces and performed over multiple cores, and the solvers invoked by \ already know how to do that.

Sign in to comment.

More Answers (0)

Categories

Find more on Mathematics in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!