'parfor' seems to assign jobs of different loopvars to cores beforehand. How may I assign jobs dynamically to speed up calculation?

3 views (last 30 days)
I am using parfor for speeding up calculation.
Lyaps=zeros(nsample,1);
flags=false(nsample,1);
parfor k=1:nsample
[Lyaps(k),flags(k)] = SympLargestLyap(SamplePoints(:,k),SympFx,opts);
end
For different 'k', the time for function 'SympLargestLyap (SamplePoints(:,k),SympFx,opts) ' to complete can vary a lot.
I find that when calculation for most k's has completed,
n = length(find(Lyaps==0));
'n' being much smaller than 'nsample', the calculation slows down.
Those remaining unfinished k's, are those whose
SympLargestLyap(SamplePoints(:,j),SympFx,opts)
takes a long time.
However, when calculation for most k's had completed and I checked my cpu, there were only three or four cores occupied, although parallel pool has 32 workers. It seems that the remaining 20s cores have fnished their job and are in rest.
May I assign the jobs for each core dynamically, so majority of them do not rest like this?

Accepted Answer

Walter Roberson
Walter Roberson on 15 Dec 2024
You can create a parforOptions object specifying RangePartitionMethod "fixed" SubrangeSize 1
and pass that parforOptions object to parfor()
This tells parfor to only allocate a single index to each worker, with the worker going back to ask for the next available task after performing the single iteration.
Normally parfor assigns chunks of indices to each worker, with the first chunk accounting for roughly 2/3 of the iterations, and the second chunk accounting for roughly 20% of the iterations, and then the remaining 10% assigned to single iterations.
When all of the iterations take roughly the same time, then the auto method works fine with minimal final waiting.
However, there is always the possibility that by chance one of the initial chunks happens to have a group of iterations that take much longer than average. For example you might be running through files but most of the files might be mostly empty and the few more substantive files might happen to be clustered near each other, so the worker that got that range of indices might take a long time while the other workers might all be fast.
By allocating SubrangeSize as 1 then you might still end up with single workers that take a long time, but you will not end up in a situation where there are multiple long iterations stuck on the same worker.

More Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!