- NVIDIA RTX 6000 Ada Generation | Total Memory: 51GB
- NVIDIA GeForce RTX 2080 SUPER | Total Memory: 8GB
gpuArray large sparse arrays. Error codes: "CUSPARSE_INTERNAL_ERROR" / "UNKNOWN_ERROR"
5 views (last 30 days)
Show older comments
I have 2 gpus: the first is a NVIDIA GeForce RTX 3090 Ti and the second is a NVIDIA GeForce RTX 2060 SUPER. I am running on a linux machine with NVIDIA Driver Version: 515.105.01 and CUDA Version: 11.7. I am using Matlab 2022b (update 7). When I create a large sparse gpuArray with the second gpu (smaller) there is no problem, when I repeat using the first gpu (larger) I get error code: UNKNOWN_ERROR or sometimes the error code is CUSPARSE_INTERNAL_ERROR
sample code:
%% Device 2 - small sparse array - no problem
gpuDevice(2)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 2 - large sparse array - no problem
gpuDevice(2)
a = speye(10000000,10000000);
a = gpuArray(a);
%% Device 1 - small sparse array - no problem
gpuDevice(1)
a = speye(100000,100000);
a = gpuArray(a);
%% Device 1 - large sparse array - problem!!
gpuDevice(1)
a = speye(10000000,10000000);
a = gpuArray(a);
Error:
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
Error in gpuTest (line 19)
a = gpuArray(a);
Device 2 is smaller (8 GB) so shouldn't be able to handle larger arrays. Here is it's details:
Name: 'NVIDIA GeForce RTX 2060 SUPER'
Index: 2
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8370192384 (8.37 GB)
AvailableMemory: 7891910656 (7.89 GB)
MultiprocessorCount: 34
ClockRateKHz: 1680000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
and here are the details of gpu that crashes (the larger one)
Name: 'NVIDIA GeForce RTX 3090 Ti'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.7000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152 (49.15 KB)
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 25431965696 (25.43 GB)
AvailableMemory: 24284102656 (24.28 GB)
MultiprocessorCount: 84
ClockRateKHz: 1950000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Any help or explanation would be much appreciated.
0 Comments
Answers (2)
Ayush
on 26 Dec 2023
Edited: Ayush
on 26 Dec 2023
I understand that you are getting the errors when you create a large sparse gpuArray with the first GPU, having higher specifications, and not getting any errors when using second gpu, with lower specifications. I tired to reproduce the issue with my 2 GPUs:
The error was reproducible only untill R2022b. From R2023a MATLAB release onwards the error has been fixed.
Thanks
Ayush Jaiswal
Joss Knight
on 3 Jan 2024
Hi Joseph. It's hard to be definitive. There were some problems with cusparse and also Windows drivers when supporting the newest GeForce Ampere cards with CUDA 11.2, but I believed this to be fixed in R2023b and CUDA 11.8. Can you raise a support ticket and follow this up with MathWorks Support? Thanks.
0 Comments
See Also
Categories
Find more on GPU Computing in MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!