terminating a parfor loop early

48 views (last 30 days)
Like lots of people before me, I'm looking for a way to get something like Break functionality in a parfor loop.
My code is an embarrasingly-parallel montecarlo code. I'm submitting runs with different input parameters for execution in a job queue. When I submit my job I specify a maximum runtime; if I accidentally submit a job that takes longer, my job terminates and I lose everything
I'd like to inside the parfor loop to check if I've exceed some specified runtime, and if so get out of the loop quickly so that the code can save to file the data it has already accumulated, rather than losing it all when the job is killed.
My plan was to do something below, so that if the loop takes longer than maxruntime, the loop will effectively be empty and we will quickly run through any remaining iterations.
My problem is that using datetime seems to be extremely slow.
Is there a better way to do what I want, or a faster way of checking time across different cores?
maxruntime=hours(4); %set maximum runtime to 4 hours
starttime=datetime; %save the time at which we start the parfor
parfor packets=1:N
if (datetime-starttime)<maxruntime
%the normal loop code goes in here
%arrays are accumulated
end
end
%code to write out the accumulated arrays from the parfor loop is here

Accepted Answer

Raymond Norris
Raymond Norris on 2 Feb 2022
The issue you'll have with parfor is that it can't terminate early. I believe what you're suggesting is to do checkpointing -- having MATLAB write to a file(s) sporadically. MATLAB doesn't have this, but looks like you're trying to explicitly do this. This can be useful to requeue the job and start further downstream or to have a minimal set of output before a job bails, etc.
My suggestion is to use parfeval. This allows for early termination of "futures" (i.e., tasks).
maxruntime=hours(4); %set maximum runtime to 4 hours
starttime=datetime; %save the time at which we start the parfor
for packets = 1:N
f(packets,1) = parfeval(@unit_of_work, ..);
end
for packets = 1:N
[idx, ..] = f(packets).fetchNext();
if (datetime-starttime)<maxruntime
continue
else
% About to run out of time, cancel all futures
f.cancel
end
end
%code to write out the accumulated arrays from the parfor loop is here
Looks like Rik has an improvement with now vs datetime you could use as well.
Obviously, this is a different approach then parfor, but it gives you the flexibility to end the loop early. You'll need to aggregate your results from fetchNext.
Another thought is to use DataQueue. doSomething will very quite a bit, depending on what you want to do with your aggregated array, but it's a starting point.
q = parallel.pool.DataQueue;
afterEach(q, @doSomthing);
parfor packets=1:N
if (datetime-starttime)<maxruntime
%the normal loop code goes in here
%arrays are accumulated
q.send(<the variable you want to accumulate>)
end
end
function doSomething(D)
% do something with D (e.g, write to file, etc.)
end
  1 Comment
David Spence
David Spence on 2 Feb 2022
You're right - I need 'checkpointing'! I can actually break the main parfor loop into a sequence of smaller parfors, and write out data to file after each smaller loop. This solves my problem I think.

Sign in to comment.

More Answers (1)

Rik
Rik on 2 Feb 2022
I personally use the now function a lot. The number it returns is in days, so you will have to scale your max time to fractional days for the comparison.
  2 Comments
David Spence
David Spence on 2 Feb 2022
Thanks. Yes, it seems now is quite a lot faster than datetime. Even so, adding a single line now; to my current parfor loop slows my code down by 25%... Better than the ten times slower that it runs with a single datetime; but still too large a hit to take on the performance.
Is there something intrinsic about getting a system time that makes it so slow?
Rik
Rik on 2 Feb 2022
I doubt using toc with an input would be much faster, but you could try:
maxruntime=seconds(hours(4)); %set maximum runtime to 4 hours
starttime=tic; %save the time at which we start the parfor
parfor packets=1:N
if toc(starttime)<maxruntime
%the normal loop code goes in here
%arrays are accumulated
end
end
%code to write out the accumulated arrays from the parfor loop is here
It all depends on how much you're doing in the rest of your loop. If it is fast, then even a low-cost function will result in a drastic performance decrease.
I don't know many more strategies to query the system time. I believe using a mex doesn't beat calling now.

Sign in to comment.

Categories

Find more on MATLAB Parallel Server in Help Center and File Exchange

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!