Parpool errors on SLURM computing system

4 views (last 30 days)
I'm running a script that relies on a parfor loop. To initialize my parpool I use the command
parpool(20) % where 20 is the number of cores I have requested.
Occasionally I get the error
Error using parpool (line 113)
Failed to start a parallel pool. (For information in addition to the causing
error, validate the profile 'local' in the Cluster Profile Manager.)
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line
675)
Failed to start pool.
Error using parallel.Job/submit (line 351)
Error closing file
/gpfs/home/cholt/.matlab/local_cluster_jobs/R2017b/Job47.in.mat.
The file may be corrupt.
Sometimes my code works fine. And sometimes I get that error. I don't really understand what's happening and why that error appears sometimes but not always. How can I fix it? Thanks for your help -C

Answers (2)

Zenin Easa Panthakkalakath
Hey Caleb,
It is possible that one or more of the workers never managed to fully start. This may have been caused by certain preference settings. Try to regenerate the MATLAB preferences.
Also can you please check the "startup.m" file if it contains any commands which will alter the MATLAB preferences. If yes, try commenting out these and run the program again.
Regards
Zenin
  1 Comment
Caleb_Holt
Caleb_Holt on 17 Aug 2018
Hey Zenin
I haven't messed with the preference settings at all, that file is still empty. I also don't have a startup.m file. I think the problem is that it is trying to access a job file that doesn't exist. I'm not sure if I should make those files or what needs to happen.

Sign in to comment.


Zenin Easa Panthakkalakath
Hey Caleb,
I understand that you haven't made any changes to the preference settings. In that case, it could mean that the MATLAB Cluster may not have been validated. Please refer to these pages in order to do the same:
Please let me know if this works for you.
Regards
Zenin
  3 Comments
Caleb_Holt
Caleb_Holt on 20 Aug 2018
In particular these lines of the error message
Error using parallel.Job/submit (line 351) Error closing file /gpfs/home/cholt/.matlab/local_cluster_jobs/R2017b/Job47.in.mat. The file may be corrupt.
I think the fact that I've reached my space limitation on my home directory, where the .matlab file is stored, is what's throwing this error. How can I change where that .out.mat file is stored?
Zenin Easa Panthakkalakath
Hey Caleb,
The directory that you've mentioned is the preference directory.
>>prefdir % this would return the preference directory.
In order to set a custom preference directory, please try the solution in this article .
Regards
Zenin

Sign in to comment.

Categories

Find more on Cluster Configuration in Help Center and File Exchange

Products


Release

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!