[Parallel Computing Toolbox]: MATLAB lost connection to workers (Is R2015b unstable?)

I've been using parallel computing toolbox for days, but this problem popped twice recently after I updated my matlab from R2014a to R2015b...not sure if it's due to the MATLAB version. Since I tested on my R2009a for couple of times, and it occurred every time. Basically, it happened to R2009a everytime, never to R2014a, twice to R015b.
Local pool is connected to 4 workers. However, after (say few hours), I get the following warning:
*********
Warning: A worker aborted during execution of the parfor loop. The parfor loop will now run again on the remaining workers.
> In parallel_function (line 596)
In myestimation(line 239)
Error using parallel_function (line 604)
All workers aborted during execution of the parfor loop.
Error in myestimation (line 239)
parfor i=1:I
The client lost connection to worker 1. This might be due to network problems, or the interactive communicating job might have errored.
****
Then matlab stopped running code since it has lost connections to all workers. Could someone help me figure out what might be the cause? Thank you very much!

6 Comments

This behaviour means that your workers are crashing, and when the parfor loop restarts, it has to start from the beginning again. Are there any crash dumps that indicate what the problem is on the workers? (See this answer for information about locating crash dumps)
Thank you Edric! I will look at the crash dumps.
Hi Edric. I have same problems and I have the crash dumps file. What should I do next to identify the source problem?
If MATLAB is crashing, I suggest you send the crash dumps to MathWorks support.
Hello,
I was running matlab parallel computing and it was working fine but after few days it stopped running and gave me same errors as reported above. I tried a bunch of different things but nothing worked out. My code is as follows:
%%Load the model
load_system('fourteenbus_DR_OR_c1');
parpool;
simout = cell(1,4);
tic
spmd
% Setup tempdir and cd into it
currDir = pwd;
addpath(currDir);
tmpDir = tempname;
mkdir(tmpDir);
cd(tmpDir);
% Load the model on the worker
load_system('fourteenbus_DR_OR_c1');
end
parfor i=1:4
load_system('fourteenbus_DR_OR_c1');
set_param(['fourteenbus_DR_OR_c1/DR_Protection_System1/PA_BR' num2str(i)],'tsc','.5');
% disp(['executed A' num2str(i)])
set_param(['fourteenbus_DR_OR_c1/OR_Protection_System1/PA_BR' num2str(i)],'tsc','.5');
% disp(['executed C' num2str(i)]) set_param(['fourteenbus_DR_OR_c1/Fault_' num2str(i)],'SwitchTimes','.5'); % disp(['fourteenbus_DR_OR_c1/Fault_' num2str(i)]) % disp(['executed B' num2str(i)]) sim('fourteenbus_DR_OR_c1'); end delete(gcp); toc
I noticed that when I comment out the line "set_param(['fourteenbus_DR_OR_c1/Fault_' num2str(i)],'SwitchTimes','.5');" the code runs fine in parallel but I cannot understand why it is doing that because it is just a set_param command like the above commands in the script. Can you please tell me how to solve this issue?
Thanks. Saqib.
你好我也遇到了这个问题,请问你是怎么解决的?很着急,在线等。。

Sign in to comment.

Answers (0)

Categories

Asked:

on 2 Oct 2015

Commented:

on 20 Mar 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!