Why do I get CUDA execution errors when training my network on a GPU?

1 view (last 30 days)
Why do I get the following error when training my neural network:
An unexpected error occurred during CUDA execution. The CUDA error was:
all CUDA-capable devices are busy or unavailable
The above only happens on a GPU and not on the CPU.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 19 May 2021
Edited: MathWorks Support Team on 19 May 2021
We suspect that the most likely issue is a kernel execution timeout.
To confirm this you can try running some GPUarray commands, such as:
A = gpuArray(rand(10))
B = A+1
If the above runs without any warnings and errors, it is likely due to kernel timeouts.
Some possible workarounds:
  1. You have to scale down your problem to make sure it does not timeout (e.g. with a smaller network, or data size) or use a different card that does not timeout.
  2. Some GPUs allow one to set the compute mode to computations (TCC) only but others don't. As a possible workaround check if your GPU allows changing to that mode.
  3. Another possible workaround is to modify the registry to increase the TDR delay value as explained in the web page below:

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!