optimize cluster configuration for virtual machine

1 view (last 30 days)
I'm testing a parfor loop where each parallel iteration has to do some pretty heavy processing (~70 GB peak mem usage, ~3 h runtime without parfor). Eventually I need to run this on hundreds of datasets, so I want to ensure a reasonable cluster configuration. I'm running Matlab 2020a (Win 10) interactively on a configurable virtual machine (256 GB RAM with 32 virtual processors).
Here's what I do:
myClus=parcluster('local'); %from virtual machine perspective, all hardware is local
myClus.NumWorkers=3; %limited by RAM: 3*70 GB < 256 GB capacity
myClus.NumThreads=10; %10 threads (=virtual processors?) per worker: 3*10 < 32 available processors
parpool(myClus);
%parfor loop...
Is this a reasonable (or at least not terribly inefficient) setup in this situation? Specifically:
  • is setting NumThreads = #virtual processors/NumWorkers (as I'm essentially doing) an acceptable strategy? I'm assuming a thread is equivalent to a virtual processor in my case, but I'm not entirely sure
  • would increasing NumThreads be expected to lead to further improvements? Since I have the luxury of configuring my hardware, I could conceivably go for e.g. 64 processors and set NumThreads=20
  • bonus points for other optimization suggestions

Answers (1)

Raymond Norris
Raymond Norris on 31 Jan 2022
When you run the code serially, do you see high core activity? If not, you might not see any appreciable difference increasing NumThreads. Depends on the work the parfor loops is running. Might be worth testing it by throttling NumThreads and seeing what performance improvements you get.
Have you consider at some point the need to scale your code to a cluster/cloud?
  2 Comments
Roy
Roy on 31 Jan 2022
thanks for the suggestion, now checked what happens when running serially. CPU usage (according to win task manager) is a mixed bag, with different parts of the code (very roughly) using 5-10%, 20-40%, or 70-100%. so that suggests greater NumThreads would be beneficial for at least part of the code.
related question now comes to mind (could edit into question): are there rules of thumb for how many threads to "set aside" for system processes in these situations? I imagine if I have 32 cores and start a pool of 4 workers with 8 threads each, that might actually be less efficient (due to interfering system processes) than assigning 7 threads to each worker?
happy to test everything of course, but hoping to gain some insights to avoid non-sensical settings.
Raymond Norris
Raymond Norris on 31 Jan 2022
I don't have a rule of thumb, but keep in mind, these are threads, not processes. By default, MATLAB has already been mapping the computation thread count to your physical core count all this time. If you haven't seen a degregation in your system, but fine to leave it as is. But as you say, testing might be the best confirmation.

Sign in to comment.

Categories

Find more on Cluster Configuration in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!