CNN training failing on multi-gpu environment

2 views (last 30 days)
Kevin Shi
Kevin Shi on 24 Apr 2019
Answered: Peng on 23 Dec 2019
I am trying to train a CNN using the multi-gpu execution environment. It trains fine on the 'auto' or 'gpu' option using only one gpu, but I am trying to make use of the four I have available. All are on the local machine running CentOS. The drivers are up to date. I also tested using all gpus in a local pool with the MATLAB example found here and it worked fine. https://www.mathworks.com/help/parallel-computing/examples/run-matlab-functions-on-multiple-gpus.html
These are the errors I receive. What can I do to make this work?
Error using trainNetwork (line 150)
The parallel pool that SPMD was using has been shut down.
Caused by:
Error using nnet.internal.cnn.DistributedDispatcher/computeInParallel (line 190)
The parallel pool that SPMD was using has been shut down.
Error using internal.matlab.desktop.editor.clearAndSetBreakpointsForFile (line 45)
The client lost connection to worker 3. This might be due to network problems, or the interactive communicating job
might have errored.
Warning: 4 worker(s) crashed while executing code in the current parallel pool. MATLAB will attempt to run the code
again on the remaining workers of the pool. View the crash dump files to determine what caused the workers to crash.
The client lost connection to worker 3. This might be due to network problems, or the interactive communicating job
might have errored.
Warning: 4 worker(s) crashed while executing code in the current parallel pool. MATLAB will attempt to run the code
again on the remaining workers of the pool. View the crash dump files to determine what caused the workers to crash.

Answers (1)

Peng
Peng on 23 Dec 2019
Hi I've got the same problem. Have you solved this already? I didn't find any solution to this yet. I'm using MATLAB R2018b, runing on a computer with 2 GPUs and Ubuntu OS.

Categories

Find more on Parallel and Cloud in Help Center and File Exchange

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!