Parallel Computing on cluster

I can use Matlab on Pronghorn(a cluster on university of nevada Reno). For my work I need to utilize the parallel computing option on this cluster.
This is my test code that starts parallel computing for matlab(I name this code, PCTest.m, for calling it on sbatch file).
clear all
tic
parfor i = 1:5
fprintf('Progress %d of %d...\n',i,5);
pause(5);
end
toc
The above code should take around 5 seconds to run, if parallel computing is working.
This is my job submission code for cluster(xyz.sh),
#!/bin/bash
#SBATCH --job-name disable_multithreading
#SBATCH --time=01:00:00
#SBATCH --nodes=1 --ntasks-per-node=6
#SBATCH --cpus-per-task=1
module load matlab
matlab -nosplash -nodesktop -r "PCTest; exit"
I type sbatch xyz.sh to run the matlab file. But this does not start workers on cluster, like it does on my desktop or laptop. Any help will be hightly appreciated. Thanks

Answers (1)

As it is written, the parallel pool will start when the parfor is called, which is then included in the timing. And since you're not explicitly starting it, it could be starting as many as 12 workers. Try the following:
clear all
% Time how long it takes to start the pool
tic
parpool(5);
toc
% Time how long it takes to run the parfor
tic
parfor i = 1:5
fprintf('Progress %d of %d...\n',i,5);
pause(5);
end
toc
To make this a bit more robust, query Slurm for the number of assigned tasks to start your pool (doing this off the top of my head, I think it's SLURM_NTASKS).
sz = getenv('SLURM_NTASKS');
if isempty(sz)
% For some reason, we're not running in a Slurm job
sz = maxNumCompThreads;
end
% Time how long it takes to start the pool
tic
parpool(sz-1);
toc
% Time how long it takes to run the parfor
tic
parfor i = 1:5
fprintf('Progress %d of %d...\n',i,5);
pause(5);
end
toc
Lastly, since R2019a, you can shorten
matlab -nosplash -nodesktop -r "PCTest; exit"
to
matlab -batch PCTest

4 Comments

Muhammad Imran
Muhammad Imran on 12 Apr 2021
Edited: Muhammad Imran on 12 Apr 2021
Hi @Raymond Norris, this is the error after following your lines of code. /apps/matlab/R2020a/bin/matlab-glselector.sh: line 27: 214918 Aborted (core dumped) $MATLAB/bin/$ARCH/need_softwareopengl $display > /dev/null 2>&1
Starting parallel pool (parpool) using the 'local' profile ...
{Error using parpool (line 145)
You requested a minimum of 53 workers, but the cluster "local" has the
NumWorkers property set to allow a maximum of 6 workers. To run a communicating
job on more workers than this (up to a maximum of 512 for the Local cluster),
increase the value of the NumWorkers property for the cluster. The default
value of NumWorkers for a Local cluster is the number of physical cores on the
local machine.
Error in ABC (line 9)
parpool(sz-1);
}
There's an issue in my example. It should be as follows (notice the else clause)
sz = getenv('SLURM_NTASKS');
if isempty(sz)
% For some reason, we're not running in a Slurm job
sz = maxNumCompThreads;
else
sz = str2num(sz);
end
Without the else clause, if sz is defined, it will be character string. Assuming sz is "6", then
sz-1
returns the numeric value 53 (which maps to the error you're seeing).
Dear @Raymond Norris Now it is generating this error message,
/apps/matlab/R2020a/bin/matlab-glselector.sh: line 27: 227803 Aborted (core dumped) $MATLAB/bin/$ARCH/need_softwareopengl $display > /dev/null 2>&1
{Error using parpool (line 145)
Parallel pool failed to start with the following error. For more detailed
information, validate the profile 'local' in the Cluster Profile Manager.
Error in ABC (line 10)
parpool(sz-1);
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line
670)
Failed to start pool.
Error using parallel.Job/submit (line 355)
Error closing file
/data/gpfs/home/mimran/.matlab/local_cluster_jobs/R2020a/Job5.in.mat.
The file may be corrupt.
}
I've seen this before. Are you submitting several Slurm scripts at once? If so, contact Technical Support (support@mathworks.com) and they'll show you the best way to create a temporary job storage location (i.e. local_cluster_jobs) for each job.

Sign in to comment.

Categories

Products

Release

R2021a

Asked:

on 12 Apr 2021

Commented:

on 13 Apr 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!