parfeval inside a class method does not update a class property

9 views (last 30 days)
I have a class with a very time consuming method heavyTask(). This method operates on each element of an array stored as a property. Since the execution on one element does not depend on others, I want to speed up the execution by using both local CPU and GPU in parallel. An abstraction of this class would be:
classdef myClass < matlab.mixin.Copyable
properties
items
result
end
methods
function self = myClass()
end
function execute(self, N)
self.items = 1:N;
f(1) = parfeval(@heavyTask, 0, self, false);
f(2) = parfeval(@heavyTask, 0, self, true);
fetchOutputs(f);
end
function heavyTask(self, gpu)
while not(isempty(self.items))
n = self.items(1);
self.items(1) = [];
if gpu
self.result(n) = gather(mean(real(eig(rand(1000, 'double', 'gpuArray')))));
else
self.result(n) = mean(real(eig(rand(1000))));
end
end
end
end
end
I use parfeval() to run two instances of heavyTask() in parallel. One uses gpuArray and the other does not. The workload is split with the self.items list of array items not processed yet. heavyTask() will check this list, pick one element, and remove it from the list. I cannot predict the number of array elements that each worker can process, so this first come first served idea is the only approach I came up with.
This is how I create the class and execute the method:
a = myClass;
a.execute(4);
a.result
Unfortunately, this is what I get:
a.result
ans =
[]
However, if I replace parfeval() and fetchOutputs() with heavyTask(self, true) I get the desired behaviour:
a.result
ans =
0.4959 0.4969 0.4891 0.4778
I have not found any question that has answered my issue. The closest match I have got is this, but it does not seem to address my problem.
Is this the expected behaviour? Is there any workaround I can implement in my class?
Many thanks in advance for your help!

Accepted Answer

Matt J
Matt J on 17 Jun 2021
Edited: Matt J on 17 Jun 2021
Couldn't you do something like this?
function execute(self, N)
gd=gpuDevice;
%Test execution times
tic;
result(1)=mean(real(eig(rand(1000, 'double', 'gpuArray'))));
wait(gd);
tgpu=toc;
tic;
result(2)=mean(real(eig(rand(1000))));
tcpu=toc;
T=floor(tcpu*(N-2)/(tcpu+tgpu));
%Divided
parfor n=3:N
if n<=T %GPU
result(n) = gather(mean(real(eig(rand(1000, 'double', 'gpuArray')))));
else %CPU
result(n) = mean(real(eig(rand(1000))));
end
end
self.result=result;
end
  10 Comments
Alberto Reig
Alberto Reig on 5 Jul 2021
According to this wait() should not be necessary with gather(), so I would leave the code without it.
Performing a dummy GPU calculation beforehand makes all GPU interations to keep a consistent performance now, so that solves the issue! It seems a somehow dirty way to warm the GPU up, but it works. Thanks for that!
I agree that gathering after each iteration may bring some overhead, but unfortunately in my case I need to gather() and clear() at each iteration since the results take most GPU memory, otherwise I get:
Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.
Matt J
Matt J on 5 Jul 2021
Thanks for that!
You are quite welcome, but if you have a solution now to your original question, please Accept-click the answer.

Sign in to comment.

More Answers (1)

Matt J
Matt J on 17 Jun 2021
Edited: Matt J on 17 Jun 2021
You cannot broadcast handle objects to a parpool. They simply get cloned and used as independent class instances on the workers. If you rewrite your class as a value class and execute using value class semantics,
a=a.execute(4);
then it should work.
  7 Comments
Walter Roberson
Walter Roberson on 18 Jun 2021
labSend() and labReceive() are for spmd only.
DataQueue and Pollable Data Queue are one-way objects. The way to be able to send data back to the client is to
  1. Have the client create a data queue before starting the workers
  2. The workers inherit the data queue. When they write to it, the client can read what was written
  3. In particular, the workers start by creating a data queue. And they write it to the data queue they inheritted.
  4. The client reads the data queue variables sent by the workers.
  5. The client can write to the data queue that it created in order to send data to the workers. The workers can write to the data queue that they created in order to send data to the client.
Sending data worker-to-worker is not supported using these queues... but I don't know what would happen if the client were to write the received queues to the other workers.
Alberto Reig
Alberto Reig on 22 Jun 2021
Update on the first come first served approach to split the workload: If the performance largely varies from worker to worker (the case of GPU and CPU) it is very likely that your best performing worker (GPU) finishes its task, there are no more array elements to process, and it has to wait for the slowest worker to finish. You may end up underutilising the fastest worker most of the time.
I saw big improvements with this approach in a computer whose CPU and GPU performs similarly. However, moving the code to another machine with a much better GPU, the performance was poorer than GPU only.
I think @Matt J's proposal, estimating a performance ratio CPU vs. GPU beforehand and splitting the workload beforehand would be a better approach as it should be possible to guarantee that the best performing worker never idles.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!