Initializing GPU on multiple workers cause an unknown error

6 views (last 30 days)
I've noticed that the following simple code results in an weird error, if I use R2016b on a machine with two GTX1080Ti and one K2200 :
% start a _new_ Matlab instance first!
parpool(16);
fetchOutputs( parfevalOnAll(@() gather(gpuArray(1)),1) )
The error message I get:
Error using parallel.FevalOnAllFuture/fetchOutputs (line 69)
One or more futures resulted in an error.
Caused by:
Error using parallel.internal.pool.deserialize>@()gather(gpuArray(1))
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
<-- repeated multiple times -->
After that, all GPU functionality gets completely broken:
>> a=gpuArray(1)
Error using gpuArray
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
Even re-starting Matlab won't help. The fix is to clear the CUDA JIT cache folder, "%USERPROFILE%\AppData\Roaming\NVIDIA\ComputeCache".
However, the following "longer pre-initialization" works OK for me:
% start a _new_ Matlab instance first and clear CUDA JIT cache if there was an error.
gpuDevice(1)
gather(gpuArray(1))
parpool();
fetchOutputs( parfevalOnAll(@() gpuDevice(1),1) )
fetchOutputs(parfevalOnAll(@() gather(gpuArray(1)),1))
AFAIU:
  1. Matlab R2016b that I use here, was designed for CUDA 7.5, and there are no binaries for CUDA Compute Capability 6.1.
  2. That's why Matlab uses CUDA JIT to recompile a ton (~400 MB) of stuff when user calls any gpu-related function the first time. (Which also causes many " gpuDevice() is slow " questions.
  3. There's something wrong with that JIT, if combined with parpool (a race condition?).
My system is: Windows 10, CUDA 8.0 (cuda_8.0.61_win10) with patch 2 (cuda_8.0.61.2_windows), nvidia driver r384.94. The CUDA_CACHE_MAXSIZE environment variable is set to 2147483647.
My questions:
  1. Is my "longer pre-initialization" workaround actually "safe"? Is it a real workaround for those "race condition"? Or is it as good as the original (might be stable on my specific system, but is likely to fail on some other)? Assuming I have to stay with R2016b for now, targeting CUDA 8.0 and Pascal GPU (building a dll).
  2. Same code works OK in R2017b-R2018a and above. Is that just because they don't use CUDA JIT here? Or is the real underlying issue actually fixed? (I don't have a device with compute capability >6.x at hand, so I'm unable to check that.)R2017a behaves like R2016b here, even though it claims CUDA 8.0 support - it still writes something (but just ~40MB) to CUDA JIT cache, fails in test #1 and works in test #2.
  10 Comments
Joss Knight
Joss Knight on 4 Jul 2018
I had a colleague check their dual GTX 1080 system and they saw no issues, with 16b or with the current version with a forced JIT.
Sounds interesting... But this does not give me the same behaviour - the ComputeCache is still almost empty after running those commands - few KB only. It looks like files are being added and instantly erased. Hmm... Could you please advice - am I doing something wrong here? Were you able to make it populate the ComputeCache?
This works for me but ... possibly only when your card's architecture is the maximum supported or higher, because if it were lower there would be no compatible PTX in the libraries. So you'll need to run R2017a or R2017b for your Pascal card.
It would be good to establish why upgrading MATLAB is not an option for you.
Igor Varfolomeev
Igor Varfolomeev on 8 Jul 2018
Edited: Igor Varfolomeev on 8 Jul 2018
It would be good to establish why upgrading MATLAB is not an option for you.
That's because in this particular case the request to me was to improve the performance without any major changes, like adding new external dependencies (e.g. newer MCR). For the next version, we'll definitely migrate to a newer Matlab.
-----------------------------------------------------------------------------------------------
This works for me but ... possibly only when your card's architecture is the maximum supported or higher, because if it were lower there would be no compatible PTX in the libraries. So you'll need to run R2017a or R2017b for your Pascal card.
Yep, I've used R2017b (because R2017a got the same issue as R2016b). But this trick does not work for me. I've just tried this once again - the ComputeCache size oscilates from 0 to few MB, while gpuDevice(1) is running (but it takes few minutes, so it's definitely compiling something). In the end, ComputeCache size is below 1 MB. That's strange. Just in case, I set this environment variable in cmd, before starting Matlab, e.g.
Microsoft Windows [Version 10.0.15063]
(c) 2017 Microsoft Corporation. All rights reserved.
C:\>cd "C:\Program Files\MATLAB\R2017b\bin\"
C:\Program Files\MATLAB\R2017b\bin>echo %CUDA_CACHE_MAXSIZE%
2147483647
C:\Program Files\MATLAB\R2017b\bin>echo %CUDA_CACHE_DISABLE%
0
C:\Program Files\MATLAB\R2017b\bin>set CUDA_FORCE_PTX_JIT=1
C:\Program Files\MATLAB\R2017b\bin>echo %CUDA_FORCE_PTX_JIT%
1
C:\Program Files\MATLAB\R2017b\bin>matlab.exe
In R2018a update 3, trying to run gpu-related commands with CUDA_FORCE_PTX_JIT=1 produces a different result. The ComputeCache remains empty. There's almost no delay. And it fails on convn:
>> getenv('CUDA_FORCE_PTX_JIT')
ans =
'1'
>> gpuDevice(1);
Warning: The CUDA driver must recompile the GPU libraries because CUDA_FORCE_PTX_JIT is set to '1'. Recompiling can take several minutes. Learn more.
> In parallel.internal.gpu.selectDevice
In parallel.gpu.GPUDevice.select (line 58)
In gpuDevice (line 21)
>> gpuDevice(2);
Warning: The CUDA driver must recompile the GPU libraries because CUDA_FORCE_PTX_JIT is set to '1'. Recompiling can take several minutes. Learn more.
> In parallel.internal.gpu.selectDevice
In parallel.gpu.GPUDevice.select (line 58)
In gpuDevice (line 21)
>> a=gpuArray(zeros([9 9 9]));
>> b=gpuArray(zeros([3 3 3]));
>> c=convn(a,b)
Error using gpuArray/convn
An unexpected error occurred trying to launch a kernel. The CUDA error was:
invalid device symbol
Probably it fails because R2018a is designed for CUDA9, and my current GPU driver does not support it. This is as-expected. The strange part is that the ComputeCache is just empty.
-----------------------------------------------------------------------------------------------
I had a colleague check their dual GTX 1080 system and they saw no issues, with 16b or with the current version with a forced JIT.
Thanks for testing this! But could you please also specify, what was the NVidia driver version?
Provided that it works in the "current version" - probably that's some newer driver, with CUDA9 support. Maybe this means that the issue is fixed in newer NVidia drivers. I think I should test this myself as well.
However, maybe, after all, this issue do depend on "mixing Pascal and Maxwell". It looks like at least some aspects do depend on it. I've recently noticed that this fails:
% clear CUDA JIT cache and restart Matlab first
gpuDevice(1);
parpool(16);
fetchOutputs( parfevalOnAll(@gpuDevice,1) )
but this works (tested twice):
% clear CUDA JIT cache and restart Matlab first
gpuDevice(1);
parpool(16);
fetchOutputs( parfevalOnAll(@() gpuDevice(1),1) )
this works as well (tested twice):
% clear CUDA JIT cache and restart Matlab first
gpuDevice(1);
parpool(16);
spmd
gpuDevice(mod(labindex,2)+1);
gather(gpuArray(1));
end
but this fails:
% clear CUDA JIT cache and restart Matlab first
gpuDevice(1);
parpool(16);
spmd
gpuDevice(mod(labindex,2)+2);
gather(gpuArray(1));
end
Starting parallel pool (parpool) using the 'local' profile ... connected to 16 workers.
Warning: An error has occurred during SPMD execution. An attempt has been made to interrupt execution on the workers. If this situation persists, it may be necessary to
interrupt execution using CTRL-C and then deleting and restarting the parallel pool.
The error that occurred on worker 13 is:
Error using gpuDevice (line 26)
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
.
> In spmdlang.RemoteSpmdExecutor/maybeWarnIfInterruptedAndWaiting (line 300)
In spmdlang.RemoteSpmdExecutor/isComputationComplete (line 131)
In spmdlang.spmd_feval_impl (line 19)
In spmd_feval (line 8)
Error detected on worker 13.
Caused by:
Error using gpuDevice (line 26)
An unexpected error occurred during CUDA execution. The CUDA error was:
unknown error
-----------------------------------------------------------------------------------------------
UPD:
I've just tried Nvidia 397.93 driver. And now the original issue is gone, and this:
% clear CUDA JIT cache and restart Matlab first
parpool(16);
fetchOutputs( parfevalOnAll(@() gather(gpuArray(1)),1) )
works OK in R2016b (tested twice). And the ComputeCache size is much smaller - only ~140MB.
So, after all, it looks like the issue does not exist in newer driver versions. So, sorry for the buzz. I should have checked this before. :)
(But the CUDA_FORCE_PTX_JIT in R2017b still behaves the same for me, by the way.)

Sign in to comment.

Accepted Answer

Igor Varfolomeev
Igor Varfolomeev on 25 Nov 2018
As noted in comments, it looks like the issue does not exist in newer driver versions. So, I'm sorry for the buzz.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!