How do I request a local parallel pool on MDCE hosts?
Show older comments
I'm working in a team on an algorithm using the nntraintool.
"When train and sim are called, they divide the input matrix or cell array data into distributed Composite values before training and simulation. When sim has calculated a Composite, this output is converted back to the same matrix or cell array form before it is returned."
We have 4 hosts running mdce with 10 workers each on mjs, but parallel train with mex workers runs faster with a local pool. (We suspect the marshalling of data to and from workers slows them down). I read Cleve's article on Parallel computing, and it shows the different types of parallel computing. It seems that I want to mix types. Parallel MATLAB: Multiple Processors and Multiple Cores
We call nntraintool 50 times sequentially with different combinations of seed number and slice of data (kfold). This takes 12 hours to complete using a local pool of 10 workers or longer with mjs of 40 workers across 4 hosts:
for fold=1:10
[net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
for rep=1:5
rng(rng_init+rep);
net=init(net);
[NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','useGPU','no')
end;
end;
I would like to request each host to concurrently run nntraintool using a local pool (to reduce the marshalling overhead). For example I could do this but I don't know if it would suffer the same problem with marshalling data:
- Use admin centre to create 4 job schedulers: mjs-h1-local, mjs-h2-local, mjs-h3-local, mjs-h4-local
- change our code as follows:
p1=parpool('mjs-h1-local')
for fold=1:3
[net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
for rep=1:5
rng(rng_init+rep);
net=init(net);
[NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','useGPU','no')
end;
end;
p2=parpool('mjs-h2-local')
for fold=4:6
[net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
for rep=1:5
rng(rng_init+rep);
net=init(net);
[NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','useGPU','no')
end;
end;
... etc.
Ideally I would do this: (note my own invented name value pair in train parameters)
for fold=1:10
p{fold}=parpool(host{1+mod(fold,numhosts)}.local)
[net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
for rep=1:5
rng(rng_init+rep);
net=init(net);
[NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','usepool',p{fold},'useGPU','no')
end;
end;
Answers (1)
Joss Knight
on 19 Dec 2016
0 votes
It's not that a 'local pool' is faster, exactly, it's the cost of communicating with a remote client. Use the batch function with the 'Pool' option to call a script on a worker which will open a local pool using the other workers. That way the client is in the same location as the pool, which reduces the communication cost. See the documentation on Batch Processing.
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!