parfor loop: How to calculate its transmission limit?

2 views (last 30 days)
Hi,
I am wondering how I can calculate the transmission limit of the parfor loop below
Time Calls Line
7.41 4 316 parfor i=1:n
317 for j=1:n
318 if i<j
319 c_ij=S(a{i} & b{j});
320 ex_ij=e(c_ij);
321 smat(i,j)=max(ex_ij);
322 abest_ij=abs(smat(i,j)-ex_ij)<5000*eps;
323 slcCell{i,j}=c_ij(abest_ij);
324 elseif i>j
325 c_ij=S(a{i} & b{j});
326 ex_ij=e(c_ij);
327 smat(i,j)=max(ex_ij);
328 abest_ij=abs(smat(i,j)-ex_ij)<5000*eps;
329 slcCell{i,j}=c_ij(abest_ij);
330 else
331 end
332 end
333 end
for n=25 or n=26 to optimize this code further and to extend its computation limits. Since, for an older version of this loop I got a Java Exception error indicating that the transmission limit of 2 GB has been exceeded in the first case. However, even this loop produces a serialization error in the latter case, which is also caused by exceeding its transmission limit launching eight workers in total. Nevertheless, due to my naive computation approach I already expected for n=25 with this code a transmission limit error. Before I turn to my naive computation approach, here is a short variable description and explanation of the loop.
Description of the variables: N=2^n-1; S and e are double with length N. a{i} and b{j} are logical with length N. a{i} is the data array of S with bit i. b{j} is the data array of S without bit j. Of course, S, e, and b are indexed but not sliced, however, a is sliced. smat and slcCell are used elsewhere. Of no relevance here. The built-ins and overhead of the complete function consume about 10 percent of the elapsed computing time according to a profile analysis.
The purpose of this parfor loop is to make a pre-selection of S for all pairs (i,j).
Now, let us discuss what I expected for n=25 and n=26.
1.) n=25 1.1 The data arrays S and e have 256 MB each. 1.2 The data arrays a{i} and b{j} have 32 MB each
Launching eight workers, I get already for the case 1.1: 512*8 = 4096 MB > 2 GB
2.) n=26 2.1 The data arrays S and e have 512 MB each. 2.2 The data arrays a{i} and b{j} have 64 MB each
Launching eight workers, I get now for the case 2.1: 1024*8 = 8192 MB > 2 GB
In any case, the above code should fail, but it only fails for the second case.
My questions are: 1.) How is the correct estimation of case 1.1 and 2.1 in order to improve this code further.
2.) Variables S, e, and b are not sliced which triggers probably a communication overhead of 10 percent. How is it possible to make S, e, and b sliced? Make this sense? Since, I have to pass the complete data array of S and e to make a pre-selection for the pair (i,j).
3.) Are there any plans for a forthcoming Matlab Release to drop the transmission limit of 2 GB in the parfor loop? I totally agree with the philosophy that constraints are helpful to optimize code, but only to a certain limit.
Cheers, Holger

Accepted Answer

Edric Ellis
Edric Ellis on 18 Jun 2012
It's not straightforward to calculate how much sliced data is sent to the workers during a PARFOR loop. The broadcast data is simpler, and you only need to consider how much is sent to a single worker since the same data is sent to each worker. When sending sliced data, PARFOR will send only a portion at a time, so it's not usually that part which exceeds the 2GB limit.
For large constants like S and e, you could try using the Worker Object Wrapper from here to send them as completely separate messages which then will not contribute towards the 2GB limit for 'broadcast' variables.

More Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!