How do I request a local parallel pool on MDCE hosts?

Question

0 votes

I'm working in a team on an algorithm using the nntraintool.

"When train and sim are called, they divide the input matrix or cell array data into distributed Composite values before training and simulation. When sim has calculated a Composite, this output is converted back to the same matrix or cell array form before it is returned."

We have 4 hosts running mdce with 10 workers each on mjs, but parallel train with mex workers runs faster with a local pool. (We suspect the marshalling of data to and from workers slows them down). I read Cleve's article on Parallel computing, and it shows the different types of parallel computing. It seems that I want to mix types. Parallel MATLAB: Multiple Processors and Multiple Cores

We call nntraintool 50 times sequentially with different combinations of seed number and slice of data (kfold). This takes 12 hours to complete using a local pool of 10 workers or longer with mjs of 40 workers across 4 hosts:

for fold=1:10
    [net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
    for rep=1:5
        rng(rng_init+rep);
        net=init(net);
        [NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','useGPU','no')
    end;
end;

I would like to request each host to concurrently run nntraintool using a local pool (to reduce the marshalling overhead). For example I could do this but I don't know if it would suffer the same problem with marshalling data:

Use admin centre to create 4 job schedulers: mjs-h1-local, mjs-h2-local, mjs-h3-local, mjs-h4-local
change our code as follows:

p1=parpool('mjs-h1-local')
for fold=1:3
    [net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
    for rep=1:5
        rng(rng_init+rep);
        net=init(net);
        [NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','useGPU','no')
    end;
end;
p2=parpool('mjs-h2-local')
for fold=4:6
    [net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
    for rep=1:5
        rng(rng_init+rep);
        net=init(net);
        [NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','useGPU','no')
    end;
end;
... etc.

Ideally I would do this: (note my own invented name value pair in train parameters)

for fold=1:10
    p{fold}=parpool(host{1+mod(fold,numhosts)}.local)
    [net.divideParam.trainInd, net.divideParam.valInd, net.divideParam.testInd ]= fold_indices(fold)
    for rep=1:5
        rng(rng_init+rep);
        net=init(net);
        [NET{fold,rep}]=train(net,INPUT_SET,TARGET_SET,'useParallel','yes','usepool',p{fold},'useGPU','no')
    end;
end;

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Joss Knight on 19 Dec 2016

0 votes

It's not that a 'local pool' is faster, exactly, it's the cost of communicating with a remote client. Use the batch function with the 'Pool' option to call a script on a worker which will open a local pool using the other workers. That way the client is in the same location as the pool, which reduces the communication cost. See the documentation on Batch Processing.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

How do I request a local parallel pool on MDCE hosts?

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Tags

Community Treasure Hunt

How do I request a local parallel pool on MDCE hosts?

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments