Parallel Configuration Validation: times out, but job finishes; what to do?
Show older comments
I'm trying to get configured to a cluster at my school. The setup is:
- My client machine and the cluster do not share a filesystem
- The headnode and the worker nodes on the cluster share their file system
- My client runs r2011b and the cluster runs r2011a
Thus I've been using the `/usr/local/matlab/r2011b/toolbox/distcomp/examples/integration/sge/nonshared` example to get set up. It appears to be mostly working:
When I start the validation, it successfully completes the 'Find Resource' step, and moves onto the 'Distributed Job' step. While it's do this, I login to the cluster and look at the running jobs, and I see that it has submitted a job to the queue, and I can watch it move from 'pending' to 'running' to 'finished'. So the job gets done. But then on my client, it times out and fails.
How do I fix this so that my client matlab session sees that the job is getting done?
Answers (1)
Edric Ellis
on 8 Mar 2012
0 votes
You must use the same version of MATLAB on both the client and the cluster. (Although that may not be the only problem - when jobs appear to complete on the cluster but the client cannot see that, it's often a permission problem on the Data Location of the cluster).
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!