severe speed degradation with batch function on cluster

3 views (last 30 days)
I run a matlab script called "runMe.m" containining "parpool('local',16)" through a PBS job sent to a worker node (method 1). I get the same runtime as running the script locally on the compute node (method 2, for testing only).
My problem is when I try to call "runMe.m" using the batch function in Matlab from the head node (method 3). The only changes is "runMe.m" has "parpool('local',16)" commented because the pool is setup from the head node. Here's some pseudocode of the batch:
pbs = parcluster('singleNodeJob')
job = pbs.batch(@runMe,0,{},'Pool',15)
'singleNodeJob' is configured with 16 workers and the PBS input "-l nodes=1:ppn=16", same as method 1 PBS script. I'm unsure how to set "threads" in 'singleNodeJob' because "parpool('local',16)" clearly runs with multiple threads on my worker node.
"runMe.m" has parfor loops which run great when on a local machine (method 1 & 2) - say 5 minutes. But when run via batch (method 3), my runtime falls apart - say 20 minutes with 'singleNodeJob' threads set to 8, and 30 minutes if threads set to 1. "runMe.m" contains both parforloops and multi-threaded functions (i.e. corr).
To add context, the reason I'm trying to run method 3 is because I have sufficient toolbox licenses via this method, whereas method 1, I run out of toolbox licenses.
Thank you for your help!

Answers (1)

Raymond Norris
Raymond Norris on 7 Apr 2022
Let me see if I have this right
  • Method #1 finishes runMe in ~5 minutes, using local scheduler
PBS jobscript
#!/bin/sh
#PBS -l nodes=1:ppn=16
module load matlab
matlab -batch runMe
runMe.m
function runMe
parpool("local",16);
parfor idx = 1:160
...
end
  • Method #2 finishes runMe in ~5 minutes, using local scheduler
% # Run MATLAB on the compute node
% qsub -I -l nodes=1:ppn=16
% module load matlab
% matlab
runMe
  • Method #3, finishes runMe in 20-30 minutes, using PBS scheduler. singleNodeJob is configured with 16 workers with -l nodes=1:ppn=16
% # Run MATLAB on the head node. Preferred choice because you have a
% # limited number of MATLAB and Toolbox licenses, and would like to use
% # MATLAB Parallel Server licenses instead.
% module load matlab
% matlab
Submit job
pbs = parcluster('singleNodeJob');
job = pbs.batch(@runMe,0,{},'Pool',15);
runMe.m
function runMe
% parpool("local",16);
parfor idx = 1:160
% Note, we're only running with 15 workers, not 16
...
end
A couple of thoughts:
  1. How are you measuring 5, 20, 30 minutes etc.?
  2. When using the local scheduler, you're using the resources already available to you in the PBS job. Therefore, starting the parallel pool will happen quicker. If you're running a PBS pool, you're submitting an "inner" job. Who know how busy the queue is. It might take 10-15 minutes for this inner job to start. This could be the crux of the matter.
  3. When you set NumThreads, for example
pbs.NumThreads = 8;
This doesn't increase the core count of your PBS jobs. Therefore, you'll have 16 workers running, each with access to 8 comp threads, all running on the same 16 cores on the single node. Conversely, if you set NumThreads to 1, then all the non-parallel code will only get a single comp thread.
  2 Comments
Raymond Norris
Raymond Norris on 7 Apr 2022
Good to see you again, David :)
One advantage PBS Pro has over TORQUE is that you can differentiate how the cores of a node (really a chunk) can be divided up. For example, for 10 cores, you could assign 5 for MPI and 2 for OMP. TORQUE lacks this granularity.
At our next call, let's discuss an idea that will address using your MATLAB Parallel Server license in lieu of MATLAB/Toolbox licenses.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!