GPU computation freezes randomly on Windows 10
Show older comments
I'm experencing a strange problem using GPU computation on a Windows 10 machine.
The function which causes the problem is a simple random walk called with arrayfun() for computation on the gpu. So nothing fancy there. Since it is only adding up the position with a random step for a certain amount of timesteps it cannot get stuck in theory.
The exact same code runs perfectly fine on Windows 7 and Windows 8.1 on the same machine using a GTX 1070 using the TdrLevel 0 registry entry. I tried several different driver versions on Windows 10 but after some random time the computation freezes. The GPU load remains at 100% but the Powerconsumption goes down from 45% to 25% and remains there forever. There is also no monitor connected to this GPU.
Sometimes I can trigger this freeze by opening the Taskmanager or GPU-Z so it seams that if something tries to get information from the GPU it freezes.
How can I debug the reason for this freeze when using arrayfun? Because when it freezes I cannot use CTRL+c to stop the computation in Matlab. I have to kill the matlab task. There is also no error in the Command Window.
Many thanks in advance, Dominik
8 Comments
Joss Knight
on 1 Nov 2017
Are you sure you're using the correct device? Try
for i = 1:gpuDeviceCount
gpuDevice
end
Cedric
on 1 Nov 2017
Also, somtimes there is that going on:
Joss Knight
on 2 Nov 2017
I'll admit that the behaviour of Windows GPUs in WDDM mode often defies explanation, but what you have here is a graphics card with timeouts disabled running a long-running kernel and causing your graphics to become suspended. Logically, your GPU is doing some graphics. If this were a laptop it would be easy to explain.
It would be helpful to know what hardware you have and how it is configured. Can you run nvidia-smi and tell me what it says?
Dominik Ludwig
on 3 Nov 2017
Edited: Dominik Ludwig
on 3 Nov 2017
Dominik Ludwig
on 3 Nov 2017
Joss Knight
on 5 Nov 2017
I'm afraid you've gone beyond my area of expertise. It would appear you need to talk to NVIDIA, since this would appear to be a hardware configuration issue. (Or perhaps you'll find someone more useful than me on this forum of course...)
Joss Knight
on 5 Nov 2017
In answer to your original question, you can't debug an arrayfun kernel in MATLAB, because it's not MATLAB code that's executing but a GPU kernel compiled from that code. But you can try attaching a CUDA debugger or analysing behaviour in one of the CUDA tools, like the Visual Profiler. The profiler can tell you quite a lot about running kernels.
Dominik Ludwig
on 8 Nov 2017
Answers (0)
Categories
Find more on Parallel Computing Toolbox in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!