nested smpd instructions
Show older comments
Hello,
I am trying to achieve a high degree of parallelism in my matlab implementation of a support vector machine. From the conceptual point of view the parallelization of a multiclass support vector machine (SVM) is not extremely hard. Indeed a multiclass SVM is composed by a set of binary SVMs that can be trained in parallel. Moreover, if I want to validate my model using a n-fold cross-validation I have another point in the code in which i can achieve parallelism.
I was thinking to parallelize my code by using nested spmd statements (outer spmd at the level of the n-fold crossvalidation, inner spmd at the level of the training of the multiclass SVM). If the inner spmd statement is in a function that is called by the outer spmd then the inner spmd sees always 0 workers available and the code crashes.
Is this normal or am I doing something wrong? Is there any alternative way to achieve parallelism on multiple levels? (parfor doesn´t do that)
Thanks for the answers!
Marco
Answers (3)
Walter Roberson
on 25 Mar 2011
0 votes
What you see is normal: nested SPMD loops that are visible to the program only allocate workers at the outer level. One of the Mathworks people who works on the parallel programming facilities has posted indicating that what you can do is have the outer SPMD loop call a function, and inside the function have SPMD loops: the inner ones would then get their own workers.
Marco
on 26 Mar 2011
0 votes
Jiro Doke
on 26 Mar 2011
Nesting spmd or parfor does not work. At least for parfor, you can put it in a function as you are doing, but the inner parfor will simply behave like a regular for loop since the outer loop will use up all the workers you open. EDIT: spmd also works in an inner function, but it doesn't use any additional workers.
I believe the only way you can achieve nested parallelism is to use either matlabpooljob or batch. These will allow you to create multiple jobs that can each have matlabpool workers. So your "outer loop" (k-fold cross validation) can be the multiple matlabpooljobs or batch jobs, but specify desired number of workers for each. Note that batch will ultimately require one extra worker in addition to the number of workers you request (because it requires a virtual client worker). Also, the outer parallelism are independent jobs, so they don't talk to each other. If you need interactions between them, I think it will be extremely non-trivial. In your case, I would assume the k-fold cross validations can be set up to be k independent validation jobs, so that shouldn't be a problem.
Here's an example of how you would use matlabpooljob:
jm = findResource();
job(1) = createMatlabPoolJob(jm);
job(1).MaximumNumberOfWorkers = 4;
job(1).MinimumNumberOfWorkers = 4;
createTask(job(1), @trainingFcn, 1, {in1a, in2a});
job(2) = createMatlabPoolJob(jm);
job(2).MaximumNumberOfWorkers = 4;
job(2).MinimumNumberOfWorkers = 4;
createTask(job(2), @trainingFcn, 1, {in1b, in2b});
% submit the 2 jobs
for id = 1:2
submit(job(id))
end
for id = 1:2
waitForState(job(id), 'finished');
results{id} = getAllOutputArguments(job(id));
end
destroy(job);
The above example illustrates how you might create two jobs where each job requests 4 workers (for a total of 8 workers). Inside your function "trainingFcn" above, you can have spmd and parfor.
5 Comments
Walter Roberson
on 26 Mar 2011
Jiro, the cure for the problem of not having any workers left is to use a different pool in the called function. You are still restricted in the _total_ number of workers you have, according to the license you have purchased, so do not have your outer pool use all of the licensed workers.
Jiro Doke
on 26 Mar 2011
Walter, I don't understand what you are saying. Are you agreeing with my answer, disagreeing, or something else? Yes, the total number of workers is limited by the license you have. In my example, I assume I have access to 8, and I'm using a total of 8 (two sets of 4).
You *cannot* start a pool of workers on a worker. So opening a pool (matlabpool open) inside a function being run on a worker would not work. That's why I'm using a low-level usage of creating jobs, which themselves can use a matlabpool.
Jiro Doke
on 26 Mar 2011
The other thing is that you can only have one matlabpool open at a time. So opening a pool inside a function would not work if there is one already open.
Walter Roberson
on 25 Jun 2012
This is confusing in view of
http://www.mathworks.com/help/toolbox/distcomp/brukbnp-9.html#brukbnp-12
"The body of an spmd statement cannot contain another spmd. However, it can call a function that contains another spmd statement. Be sure that your MATLAB pool has enough workers to accommodate such expansion."
Muna Tageldin
on 27 Sep 2020
so if i have this program which use spmd and drange over distributed range (size of array a)
spmd
y=codedistributed(random(3,20),codedistributor1d(2));
a=codedistributed(random(1,20),codedistributor1d(2));
b=codedistributed(random(3,20),codedistributor1d(2));
for i=drange(1:20)
y(1,i)=user_defined_function(a(i),b (1,i));
y(2,i)=user_defined_function(a(i),b (1,i));
y(3,i)=user_defined_function(a(i),b (1,i));
end
end
The user_defined_functin contains 3 nested for loops and the function takes (75 seconds) and I want to speedup the function time by using another spmd.
function y=user_defined_function(a,b)
[ep,u,lam]=ndgrid(1e-3:1e-2:1,1e-3:1e-2:1,1e-3:1e-2:1);
for i=1:size(ep,3)
for j=1:size(ep,2)
for p=1:size(ep,1)
l1(p,j,i)=ep(p,j,i)+u(p,j,i)*a+sum(lam(p,j,i)*exp(-b));
end
end
end
By the documentation, I know I can use spmd inside another function. My question is if I call the function 3 times inside for-drange loop, will it effects the excution time of the function (from 75 seconds to 4 seconds)
Categories
Find more on MATLAB Parallel Server in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!