CNN training failing on multi-gpu environment
2 views (last 30 days)
Show older comments
I am trying to train a CNN using the multi-gpu execution environment. It trains fine on the 'auto' or 'gpu' option using only one gpu, but I am trying to make use of the four I have available. All are on the local machine running CentOS. The drivers are up to date. I also tested using all gpus in a local pool with the MATLAB example found here and it worked fine. https://www.mathworks.com/help/parallel-computing/examples/run-matlab-functions-on-multiple-gpus.html
These are the errors I receive. What can I do to make this work?
Error using trainNetwork (line 150)
The parallel pool that SPMD was using has been shut down.
Caused by:
Error using nnet.internal.cnn.DistributedDispatcher/computeInParallel (line 190)
The parallel pool that SPMD was using has been shut down.
Error using internal.matlab.desktop.editor.clearAndSetBreakpointsForFile (line 45)
The client lost connection to worker 3. This might be due to network problems, or the interactive communicating job
might have errored.
Warning: 4 worker(s) crashed while executing code in the current parallel pool. MATLAB will attempt to run the code
again on the remaining workers of the pool. View the crash dump files to determine what caused the workers to crash.
The client lost connection to worker 3. This might be due to network problems, or the interactive communicating job
might have errored.
Warning: 4 worker(s) crashed while executing code in the current parallel pool. MATLAB will attempt to run the code
again on the remaining workers of the pool. View the crash dump files to determine what caused the workers to crash.
0 Comments
Answers (1)
Peng
on 23 Dec 2019
Hi I've got the same problem. Have you solved this already? I didn't find any solution to this yet. I'm using MATLAB R2018b, runing on a computer with 2 GPUs and Ubuntu OS.
0 Comments
See Also
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!