GPU large array slowdown

8 views (last 30 days)
Stefan Weichert
Stefan Weichert on 21 Oct 2022
Answered: Himanshu on 26 Sep 2023
Hi,
I am performing elementwise multiplication ( a lot of them ) on large 3D matrices. Therefore I bought a GeForce 3070 with 8Gb or VRAM.
Since this happens pointwise, I can break the matrices up into blocks and connect the resulting sub-matrices afterwards.
I have already made sure that the combined size of the matrices is smaller than 1/2 of the VRAM, to make sure the automatic copying that happens sometimes has enough space to do so.
However, if I utilize more than ca 10% of VRAM the whole calculation is slowed down a lot (4x-5x). I can of course just limit the usable VRAM to 10% of what GPUDevice gives me, but it seems kind of silly, since I have all this free space, and cannot use it. Why is that?
Also, when going over the ca 10%, I notice that the windows task manager shows a lot of "copying" is going on on the gaphics card. I guess that this is related, but I don't understand it. Is there maybe another limitation with GPUs? Like a maximum variable size?
Thanks in advance to whoever takes the time to help me understand more :)
Best regards
ps.: I don't provide the code because it is scattered across several functions and subfunctions.
  2 Comments
Matt J
Matt J on 21 Oct 2022
Edited: Matt J on 21 Oct 2022
However, if I utilize more than ca 10% of VRAM the whole calculation is slowed down a lot (4x-5x).
Does that include the time to shuttle the data from CPU to GPU and back?
Also, how are you measuring the time? Using gputimeit, or seomthing else?
Stefan Weichert
Stefan Weichert on 22 Oct 2022
Hi, Matt J
The shuttling between CPU and GPU should be more or less the same, since I only divide the same data up into smaller portions. And the vast majority of time is spent on the pointwise matrix multiplication.
I measure the time simply with tic toc, so that it includes loading the input file and storing the output file.
I don't think any of this is relevant, since all I do is push the data not all at once onto the GPU cut into smaller blocks and then do the same things. (Actually it requires storing some intermediate files, which slows it down a bit, but it still is 4-5 times faster even including that). My guess is that I cross some other threshold which gets handled automatically to avoid some overflow/too large something, but involves constant copying...

Sign in to comment.

Answers (1)

Himanshu
Himanshu on 26 Sep 2023
Hello Stefan,
I understand that you are trying to perform elementwise multiplication on large 3D matrices using a GeForce 3070 GPU and are facing performance slowdowns when more than 10% of the VRAM is utilized.
The slowdown might be due to a few factors:
  • Memory Bandwidth Saturation: If a large portion of the VRAM is used, you might be reaching the bandwidth limit, causing more time to be spent on data transfer than processing.
  • Memory Management Overhead: The GPU's memory management systems could spend considerable time moving data around to optimize access patterns when VRAM usage is high.
  • Thermal Throttling: High VRAM usage could increase GPU temperature, triggering thermal throttling and slowing down the computation.
To address this issue, you can:
  1. Optimize Memory Usage: Try to minimize the amount of data that needs to be stored in VRAM at any one time. This could involve reorganizing your computations to reduce memory usage or using more efficient data structures.
  2. Use a GPU Profiling Tool: Tools like NVIDIA's Nsight can provide detailed information about the GPU's memory usage and computation time, helping you identify the cause of the slowdown.
  3. Monitor GPU Temperature: Keep an eye on the GPU's temperature to see if thermal throttling might be causing the slowdown.
You can refer to the below documentation to understand more about GPU computing in MATLAB.

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!