Batch seems to produce insane overhead?
4 views (last 30 days)
Show older comments
I am running different, independent, simulations F1,F2,F3,F4 on the same parameters (from which I each require the first three output arguments, except for F1 from which I only require one output argument), and try to parallelize them using batch (when I think about it, putting their function handles in a cell array which I index over using parfor might have been an alternative, but that seems a bit more cheating).
So, with batch, I have done it this way (R2016b)
F1batch=batch(@F1,3,{inputargs});
F2batch=batch(@F2,1,{inputargs});
F3batch=batch(@F3,3,{inputargs});
F4batch=batch(@F4,3,{inputargs});
wait(F1batch)
diary(F1batch)
F1output=fetchOutputs(F1batch);
%process F1output
wait(F2batch)
diary(F2batch)
F2output=fetchOutputs(F2batch);
%process F2output
wait(F3batch)
diary(F3batch)
F3output=fetchOutputs(F3batch);
%process F3output
wait(F4batch)
diary(F4batch)
F4output=fetchOutputs(F4batch);
%process F4output
%Plot overall results
As I have included tic-toc commandos within F1-F4 I can see the time spent in each of these functions by their diaries and see their elapsed times are 7165 sec, 3193 sec, 6119 sec and 13539 sec. As batch sends away each of these jobs to be done in parallel with the rest, so that the total time would not be much larger than the maximum of each of these functions, so about 13540 sec.
Nevertheless, I ran this code using the profiler, and it has shown a total time of more than 22900 sec, which was spent almost entirely on the wait-method (so I am confident the time for output processing and visualizing in the main function -only basic arithmetic- was neglegible). So what happened during the 22900-13540 sec difference?
I don't really suspect there were resource problems for doing four jobs in parallel, because when I use parfor my machine defaults to six workers (Speaking of which, why doesn't batch show a green parallel pool icon in the bottom-left corner as a parfor or spmd does?) . Also when I briefly looked at task manager, it didn't show all cores occupied.
4 Comments
Walter Roberson
on 9 Oct 2017
If you are doing math on sufficiently large arrays, then the 7165 sec, 3193 sec etc might reflect elapsed time having automatically invoked 6 cores. When restricted to one core, they could take 6 times longer, but you get to do up to 6 if them in parallel. Your total time requirement could then become 6 * max([7165, 3193, 6119 , 13539]) = 6 * 13539, with the workers for the 7165, 3193, 6119 and 2 idle workers sitting idle after having finished their tasks.
Edric Ellis
on 10 Oct 2017
In releases prior to R2017a, you can use maxNumCompThreads directly on the workers to control the number of computational threads - it's just not integrated into the parallel cluster profile.
Answers (0)
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!