Why does the gather() function only take around 0.001 seconds at command window, while 1s in a loop?

2 views (last 30 days)
Hi everyone,
I wrote an algorism to solve a set of pde. There are some very long equations in the middle of while loop, and I tried to calculated these equations by GPU for speeding up, then the results from these equations are further employed. It runs successful, however, the gather() function takes almost 1/5 of total time (3000s). I tried to use "[dk1,dk2,dk3,dk4,dk5,dk6,dk7,dk8,dk9] = dkfunction(S,T)" in command window, it just cost 0.001s. I am confused why this code in a big loop performs badly? I want to know if there is a way to speed up gather() function, or an alternative function? By the way, the size of array in mex file should be constant or not?
(Note: my code is very long, so I attatch the profiler and core part (simplified) of my question. dkfunction is related to cuda. )
while t<24*3600*250
%%%
%Omitted code for S and T calculation.
%%%
[dk1,dk2,dk3,dk4,dk5,dk6,dk7,dk8,dk9] = dkfunction(S,T);
[dkvTdT,dkvTdS,dkvhdT,dkvhdS,dphaivdT,dphaivdS,dkhdS,dkdS,dphaidS]=gather(dk1,dk2,dk3,dk4,dk5,dk6,dk7,dk8,dk9);
%%%
%Omitted code to use dkvTdt..... to calculate other variables and t steps.
%%%
end

Accepted Answer

Walter Roberson
Walter Roberson on 28 Dec 2020
gather waits for the gpu to finish. When you are working on the command line, you already started the gpu work and it probably takes you a couple of seconds to enter the command to gather in a timed way, and in the meantime the gpu kept working and already had an answer.
Also you should use gputimeit() for gpu timing.
  5 Comments
Walter Roberson
Walter Roberson on 6 Jan 2021
Call wait(gpuDevice) before calling gather. You'll find it now runs a lot faster.
I think what Joss is telling you here is that there are two things happening when you gather:
  1. The code waits for the GPU to finish the calculations
  2. The data is transferred back from the GPU.
When you profile without using wait() on the GPU device, the time recorded reflects both phases, making it naively appear that the transfering of the data is taking a long time.
If you change your code to
while t<24*3600*250
%%%
%Omitted code for S and T calculation.
%%%
[dk1,dk2,dk3,dk4,dk5,dk6,dk7,dk8,dk9] = dkfunction(S,T);
wait(gpuDevice)
[dkvTdT,dkvTdS,dkvhdT,dkvhdS,dphaivdT,dphaivdS,dkhdS,dkdS,dphaidS]=gather(dk1,dk2,dk3,dk4,dk5,dk6,dk7,dk8,dk9);
%%%
%Omitted code to use dkvTdt..... to calculate other variables and t steps.
%%%
end
then the time spent waiting to complete the calculation will be accounted against the wait() line, leaving the timing for the gather() line to reflect only the time spent transferring the data back. This will make it more clear as to whether your time is being spent waiting for the GPU to finish or being spent transferring data back from the GPU once the calculation is finished.
Using wait(gpuDevice) will not reduce the total time (not meaningfully): it will only affect how the time is recorded.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!