Parfor Error: lost connection

6 views (last 30 days)
George
George on 7 Nov 2012
Dear all,
I am using parfor-loop in my script. the script works fine on my own computer with matlab 2012a. but when I run the script on another computer with matlab 2011b, it gives me error:
Error using parallel_function (line 598) The session that parfor is using has shut down
Error in Myscript (line 97) parfor k=1:px
The client lost connection to lab 6. This might be due to network problems, or the interactive matlabpool job might have errored. This is causing: java.io.IOException: An existing connection was forcibly closed by the remote host
Anyone can give me any clue about this problem?
thank you in advance.
Kind regards George
  2 Comments
Walter Roberson
Walter Roberson on 7 Nov 2012
Are both machines running 32 bit or both 64 bit?
Are you attempting to transfer more than 2 Gb of data?
George
George on 8 Nov 2012
Dear Walter,
Both are 64bit matlab running on windows7 64bit with 24Gb Ram.
Dataset is around 1.5Gb
Thank you.
Rgards,
George

Sign in to comment.

Answers (2)

Jason Ross
Jason Ross on 7 Nov 2012
There are a number of logs you can look at to try and gain some insight. They are located (by default) either in /var/log/mdce on Linux/Mac, and %TEMP%\MDCE\log on Windows. This might tell you what was going on.
Since the connection was "forcibly closed", that could be the result of something at the OS level, and you could take a look at the system logs / event viewer for any clues as to what might be going on.
Without reviewing the logs, though, there's not a lot to go on. There are many reasons something could forcibly shut down.
You can also try running a validation of the cluster (Parallel, Manage, select cluster profile, validate) or run the connectivity tests in Admin Center (matlabroot/toolbox/distcomp/bin/admincenter) to see if there's something off with respect to your setup.
Note I'm assuming that you are using MDCE. It would also be helpful if you could list what OS you are on, too.
  4 Comments
George
George on 12 Nov 2012
yes, I am using local scheduler, and not job manager.
I tried to run the script on 2012b. it's strange that the first two times run successfuly, but the third time gives me the similar error.
The client lost connection to lab 4. This might be due to network problems, or the interactive matlabpool job might have errored.
I found MatlabDesktopCreateError.log in the AppData\Local\Temp, but it's creadted in Septemember.
any suggestion?
thank you.
Jason Ross
Jason Ross on 19 Nov 2012
I was out of town for a little while -- unfortunately I don't have much of a general suggestion. You might want to contact support, as it might be related to your unique situation somehow.

Sign in to comment.


Francisco
Francisco on 5 Feb 2013
Edited: Francisco on 5 Feb 2013
May be you are not working completely within the MATLAB environment. Like for example, you are using the system environment while invoking results from a 32bit application run in parallel which executes outside MATLAB for 64bit.
If so, the solution could be export to that application a lesser amount of data to not be always working nearly around the fine limits of the allowed memory usable by that application; even if it worked for two or three cycles, with a huge amount of data in the memory, a tiny increment in the working memory could perturb the task executed outside MATLAB.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!