Fastest large SVD computation in multithreaded machine?

3 views (last 30 days)
I have a waveform optimization problem that uses a large SVD in an interative loop. For instance, with my current settings, the matrix is 18800x18937, or roughly 356M elements (complex, either double or single, been experimenting). I have a machine with enough RAM (128 GB) to hold the entire thing; for other portions of the code I get some help using parpool('threads') and a parfor. Given the matrix sizes, I don't see a process pool as an option for other portions of the code.
Watching the utilization, it appears it uses about 6-10 threads for the native SVD, so it's partly parallel; I don't know if this is a Linux processor counting problem because the parpool only generated 10 workers despite it having 20 CPU cores. SPMD and distributed arrays seem to be only for process pools; with threads I don't need to distribute the array in memory.
Any tips on what to try, or is it going as fast as possible already? RdTildeRoot is 18800x18800 and computed once, Z is updated each iteration. From profiling, the SVD on this one line is 90% of the execution time.
[Ubar, ~, Utilde] = svd(RdTildeRoot * Z, 'econ');
  4 Comments
Martin Ryba
Martin Ryba on 25 Jan 2022
Ah, using /proc/cpuinfo, it's an Intel Xeon W-2155 so it's 10 real cores 20 threads. So the parpool probably did the right thing? Or should it perhaps be allowed to go higher? I found with process pools that typically using about 85% of the cores is best. maxNumCompThreads comes back with 10. Is that really what the SVD is using, or are there algorithm limitations? The load average while the SVD is running is generally about 7-8, and top shows a CPU utilization below 50%. The SVD takes on the order of half an hour per iteration, so those averages are pretty stable. I guess I can try upping it at see what happens. Thanks for the advice.
Martin Ryba
Martin Ryba on 25 Jan 2022
Update: upping to 15 at least while it's in the middle (hit pause and then continue) didn't change much. Watching top while it's going, it does appear that the CPU% tops out at 1000% which makes sense. At times during the SVD it drops down to 250% or so, so perhaps there's portions of the algorithm that can't go full rate. Unfortunately given the array size I probably can't push it to the GPU.

Sign in to comment.

Answers (1)

Benjamin Thompson
Benjamin Thompson on 25 Jan 2022
If you have the parallel computing toolbox and a good NVIDIA GPU, try using gpu Arrays. Try svds if you don't need all the singular values.

Tags

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!