Using kernels in for-loop: computation time of GPU scales linearly with iterations

Silvia

16 Jan 2014

0 Answers

Updated 22 Jan 2014

3 Views (30 days)

Follow Question

Show older comments

Open in MATLAB Online

1 vote

I've got an algorithm in MATLAB which is based on a for-loop of time steps as follows:

for cnt=1:cnt_max
    do calculation based on measurement data and result of previous time step
end

If I now use a gpuArray interface and arrayfun, then my computation time per iteration scales linearly with cnt. The same happens if I write the functions in CUDA and make kernels in MATLAB to do the calculations using feval:

Make ten different Kernels with parallel.gpu.CUDAKernel
Set their gridSize and ThreadBlockSize
Initialize result variables as gpuArrays
for cnt=1:cnt_max
      tic;
      data_cnt = gpuArray(data_cnt) %data is stored in matrix on CPU
      result1_cnt=feval(myKernel1,result_cnt1,input)
      (...) 
      result10_cnt=feval(myKernel10,result_cnt10,input)
      wait(gpuDevice);toc;
end

I really have no clue why my computation time is getting bigger and bigger. I neither create variables inside the loop nor do I change their size. I am not used to GPU computing and CUDA, so I don't know what to do. I use MATLAB R2013b, the parallel computing toolbox and GPU "Tesla K20c".

5 Comments
Show 3 older comments Hide 3 older comments

Silvia on 20 Jan 2014

Open in MATLAB Online

plot.jpg

As you can see in the attached plot for the previous version my computing time per iteration was linearly scaling up. With the modification x^2 -> x.^2 I didn't have this problem anymore.

Previous version:

for i=1:N
    tic;
    %calculate A and B on GPU
    res=sqrt( (A-datalist(i).x(1,1))^2+(B-datalist(i).x(2,1))^2 );
    wait(gpuDevice);
    time_per_iteration=toc;
end;

Fixed version:

for i=1:N
    tic;
    %calculate A and B on GPU
    res=sqrt( (A-datalist(i).x(1,1)).^2+(B-datalist(i).x(2,1)).^2 );
    time_per_iteration=toc;
end;

where A, B are singleton gpuArray and datalist is stored on CPU

Silvia on 22 Jan 2014

I am not sure if it is related, but I just discovered another strange behaviour of using ^2 on gpuArray.

A is a negative gpuArray singleton: A<0, imag(A)=0

B = A^2 -> imag(B)=0.0000e+00

B = A.^2 -> imag(B)=0

B = abs(A)^2 -> imag(B)=0

B = A*A -> imag(B)=0

So if I use ^2 on a negative singleton gpuArray, then the result gets an imaginary part. This part is in fact zero, but to MATLAB A is no real number anymore.

Follow Question

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Using kernels in for-loop: computation time of GPU scales linearly with iterations

5 Comments
Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

Using kernels in for-loop: computation time of GPU scales linearly with iterations

5 Comments Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments