Clear Filters
Clear Filters

Why do I receive an error that no supported GPU device was found when submitting a job to a MATLAB Parallel Server cluster using Slurm?

1 view (last 30 days)
Why do I receive an error that no supported GPU device was found when submitting a job to a MATLAB Parallel Server cluster using Slurm?
Unable to find a supported GPU device.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 10 Apr 2024
Edited: MathWorks Support Team on 10 Apr 2024
This error may occur if...
  • MATLAB Parallel Server cannot detect the node's GPU
  • GPUsPerNode has not been added to the integration scripts
  • The GPU is not being requested in the cluster profile correctly
  • Slurm's configuration has not made any GPUs available
To tell if MATLAB Parallel Server can detect a GPU, run this command on the worker node in question:
Linux
matlab -dmlworker -r "gpuDevice"
Windows
matlab -dmlworker -batch "gpuDevice"
Please use the latest integration scripts with your cluster profile. When using the integration scripts, you will need to add this to the file getCommonSubmitArgs.m:
% GPU
ngpus = validatedPropValue(ap, 'GPUsPerNode', 'double', 0);
if ngpus>0
gcard = validatedPropValue(ap, 'GPUCard', 'char', '');
commonSubmitArgs = sprintf('%s --gres=gpu:%s:%d', commonSubmitArgs, gcard, ngpus);
commonSubmitArgs = strrep(commonSubmitArgs,'::',':');
end
You can then use the AdditionalProperty "GPUsPerNode" in your cluster profile to specify GPUs per node. Otherwise, you'll need to add "--gres=gpu:%s:%d" to your AdditionalSubmitArgs. One of these methods should be used to request GPUs per node.
If none of these things work, please make sure that GPUs have been added to the Slurm and gres configuration files.

More Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Tags

No tags entered yet.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!