Standalone MATLAB Parallel Server Randomly Fails/Succedes
7 views (last 30 days)
Show older comments
I'm trying to get an application to run consistently on a MATLAB parallel server (which is set up on a rack of servers in my office that are not connected to the internet) and it randomly fails/succeeds with errors like:
"Expected one output from a curly brace or dot indexing expression, but there were 0 results."
or
"Unexpected failure to indicate all intervals added."
I'm using Parfor for my parallelization, I have AutoAddClientPath on, but do need to manually attach matlab scripts and data files to the parallel pool to get the application to work. I do not need to do manually attach any file when running locally. My MATLAB parallel server runs an MJS, and has 8 nodes all running Windows Server.
What I've learned so far:
- This error always involves an object that has been transfered to the parallel pool (usually a I require to be a broadcast variable). Ive gotten this error in more than one application, but always relates to a object copied to the parallel pool.
- When I run "disp" on the object in quesiton while debugging, it shows an empty double object, I think of class double (i.e. [ ])
- The Failures are inconsistent, but seem to be more likely as the number of workers increases. Less than 20 workers always succeeds, 64 workers almost always fails, but has worked atleast once.
- I'm attaching a large amount of matlab script files and data to the parallel pool. The broadcast variables I'm using can also be large.
- My application ALWAYS works on the local cluster.
I'm running out of ideas to try to get this up and running, any help would be greatly appreciated.
2 Comments
Edric Ellis
on 18 Mar 2021
It does sound like there is some sort of problem transferring some of your data to the workers. The fact that the failures are more likely with more workers suggests that perhaps it is a memory problem. Are you able to track resource usage on the worker machines while this is happening? When the first error message occurs, is there any error stack showing? (At a guess, this is coming from within the worker-side implementation of parfor when it is trying to unpack the data that has been sent from the client). Finally, it might be worth contacting MathWorks support directly to address this.
Answers (0)
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!