Parallel Configuration Validation: times out, but job finishes; what to do?

I'm trying to get configured to a cluster at my school. The setup is:
  • My client machine and the cluster do not share a filesystem
  • The headnode and the worker nodes on the cluster share their file system
  • My client runs r2011b and the cluster runs r2011a
Thus I've been using the `/usr/local/matlab/r2011b/toolbox/distcomp/examples/integration/sge/nonshared` example to get set up. It appears to be mostly working:
When I start the validation, it successfully completes the 'Find Resource' step, and moves onto the 'Distributed Job' step. While it's do this, I login to the cluster and look at the running jobs, and I see that it has submitted a job to the queue, and I can watch it move from 'pending' to 'running' to 'finished'. So the job gets done. But then on my client, it times out and fails.
How do I fix this so that my client matlab session sees that the job is getting done?

Answers (1)

You must use the same version of MATLAB on both the client and the cluster. (Although that may not be the only problem - when jobs appear to complete on the cluster but the client cannot see that, it's often a permission problem on the Data Location of the cluster).

Categories

Asked:

Ian
on 7 Mar 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!