Can I run custom Matlab function or gpuArray on another GPU?

Hello, everyone
I work on image processing and handle large scale images. I designed an algorithm that could get the result I wanted by multiple iterations, and it ran successfully.
%% code example
input = zeros(500,500,50);
a = zeros(500,500,50);
b = zeros(500,500,50);
c = zeros(500,500,50);
d = zeros(500,500,50);
input = gpuArray(input);
a = gpuArray(a);
b = gpuArray(b);
c = gpuArray(c);
d = gpuArray(d);
for i=1:100
a = input + d;
b = a + input;
c = b + a;
d = func1(c);
end
I use the gpuArray provided by matlab to speed up operations.But the size of the image I'm processing is constrained by the size of the GPU's memory.
Can I load c,d arrays on another gpu? Or can I run func1 on another GPU2? I want to reduce the consumption of memory of my current GPU1. Do you have any suggestions?

 Accepted Answer

You can use parallel syntax to process other arrays on other GPUs at the same time, or to process some data on the CPU at the same time as processing others on the GPU. Try this documentation for some examples.
In your code example your initial arrays a, b and c are all immediately overwritten by other variables. Does that matter? Anyway, worth checking.

5 Comments

Thank you very much for your reply, I will try your suggestion soon.
However, I still have some doubts, my array depends on the array value obtained by the previous calculation, for example, d depends on the value of c, c depends on b and a, and the new d value will affect the next iteration of a, Is parallel processing possible in such a case, I have some doubts.
In addition, as for a, b and c are immediately overwritten by other variables, it is because I just use it as an example, omitting a lot of intermediate steps, they update their values through some meaningful operations.
Assuming that the arrays a, b, and c are running on gpu1, due to memory reasons, the operation of func1 cannot be completed on gpu1,I try to make changes like this:
%% code example
input = zeros(500,500,50);
a = zeros(500,500,50);
b = zeros(500,500,50);
c = zeros(500,500,50);
d = zeros(500,500,50);
gpuDevice(1);
input = gpuArray(input);
a = gpuArray(a);
b = gpuArray(b);
c = gpuArray(c);
% some omitted content
for i=1:100
a = input + d;
b = a + input;
c = b + a;
d = func1(gather(c));
end
%%%%%%%%%%%%%%%%%% split line %%%%%%%
function g = func1(i)
gpuDevice(2);
i = gpuArray(i);
k = gpuArray(ones(size(i)));
g = k.*i
end
But when my program finishes running func1, returns the value, and goes to the next iteration, the values of a, b, and c are all cleared.
It seems that when converting to gpu2, all data on gpu1 is cleared. Maybe it is effective to convert all the results into cpu operations through gather before running func1, but this will take a lot of time(because I need to iterate multiple times). Since the contents of gpu1 have been cleared, there is no need to use gpu2 to calculate, is there any efficient solution?
Regarding the GPU parallel solution you mentioned, I am not particularly clear on how to apply it in my code. Could you please give me a code template that applies to my example, I am not particularly clear about the parallel solution you said.
Thanks a lot!
Best wishes!
I tried to use spmd in this way, but it did not work
%% code example
input = zeros(500,500,50);
a = zeros(500,500,50);
b = zeros(500,500,50);
c = zeros(500,500,50);
d = zeros(500,500,50);
parpool('local',2);
spmd
gpuDevice(1);
input = gpuArray(input);
a = gpuArray(a);
b = gpuArray(b);
c = gpuArray(c);
% some omitted content
for i=1:100
a = input + d;
b = a + input;
c = b + a;
d = func1(gather(c));
end
end
%%%%%%%%%%%%%%%%%% split line %%%%%%%
function g = func1(i)
gpuDevice(2);
i = gpuArray(i);
k = gpuArray(ones(size(i)));
g = k.*i
end
Hi Zhenhong. It does look as if your algorithm may be inherently serial so there may be no way to parallelize. However, it is often true that a serial algorithm can be computed in parallel chunks, for instance, a sum a+b+c+d can be computed by calculating (a+b) and (c+d) in parallel before summing the result. So perhaps rethinking your algorithm might be a way to start. Alternatively, if you call your entire algorithm multiple times with different inputs you could do each call in parallel.
As for device selection, you shouldn't be doing that. The point is to select a different device on each worker, but this is done for you automatically. If you select the device manually you will reset the device and clear all GPU variables. If you have two devices and you open a pool with two workers, for instance by going parpool('local',gpuDeviceCount), then any operations on those workers will take place on different devices.
There are various other odd issues with your code but of course I understand it's just a simplified example. You shouldn't need to gather the inputs to func1, because it supports gpuArray inputs. It seems you did it to switch device, but as I pointed out you should never do that within a worker. When results are returned from the worker back to the client MATLAB, they are automatically transferred from whatever GPU the worker is using to the client's GPU.
I understand. Despite some frustration, thank you very much for your patient response.

Sign in to comment.

More Answers (0)

Products

Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!