Sum of squares profiling on GPU

I was profiling some code that runs on my GPU and came across something rather puzzling that I haven't been able to sort out... maybe it has something to do with the way the profiler interacts with the GPU, so I also tried on the CPU and got very different results. Here is the code:
clear all
g = gpuArray.rand(600, 600, 400, 'single');
for i = 1:100
x = sum(g, 3)/400;
gSq = g.^2;
y = sum(gSq, 3)/400;
g = g+.01;
end
This code is just an example of the problem, not the actual code I am running, so don't try to wonder why anybody would do this...
On the GPU the profiler shows basically ALL of the time is spent on the line
y = sum(gSq, 3)/400;
On the CPU, the profiler shows most of the time being spent on
g = g+.01;
and the remainder of the time is evenly distributed among the other lines.
Why is summing the gSq array so expensive on the GPU relative to summing the x array? They are the same size... I don't think it is a memory issue since my GPU has 4GB memory and almost 3GB is still available with g, x, gSq and y in memory.
Any ideas?

3 Comments

Matt J
Matt J on 5 Oct 2013
Edited: Matt J on 5 Oct 2013
Is this being done inside a function file or is it just a script? And, if you do the operations in a different order within the for loop, do you get the same profiling results?
The code above is the entire script. However, the original source of the problem is in a function file.
If I change the order so that the gSq sum is computed before the g sum the profiling results stay the same.
Upon further investigation, I can conclude that the profiler does not actually assign credit to each line in a correct manner when dealing with the GPU. For instance, if I run
g=gpuArray.rand(600, 600, 400, 'single');
for i = 1:1000
gSq = g.^2;
g = g+.01;
end
The whole script terminates in about 16 seconds and almost all of the time is assigned to the line gSq = g.^2;
However, after adding the line where the sum is computed:
g=gpuArray.rand(600, 600, 400, 'single');
for i = 1:1000
gSq = g.^2;
x = sum(gSq, 3);
g = g+.01;
end
The script now takes 40 seconds to run and only about 0.5 seconds in total is assigned to the line gSq = g.^2. This indicates that appropriate credit is not assigned to each line.
Secondly, using the squaring operation, .^2, takes two to three times as much time as explicitly multiplying the quantity by itself. Changing the line
gSq = g.^2;
to
gSq = g.*g;
results in a script that runs in about 5 seconds without the sum and 20 seconds with the sum; indicating about 10 seconds are saved in computing gSq and another 10 seconds are saved when computing sum(gSq, 3)... very strange.

Sign in to comment.

Answers (1)

Asked:

on 5 Oct 2013

Commented:

on 7 Oct 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!