MATLAB Answers

Read Only Globals in Parallel Computing

9 views (last 30 days)
Dan
Dan on 17 Jul 2012
When porting a serial application to parallel, I often run into the problem that the script I'm trying to port uses globals. I know its costly to implement full read/write globals, but it should be possible to implement a read only global block for the labs. Especially in local, single workstation, environments. It would be possible to establish a detached memory block that is set read only and give each lab process access to it. This wouldn't be too hard in a distributed environment either.
My only other alternative is to pass the structures through arguments which is very cumbersome. Here I'm talking about a long list of variables that describe the environment in which the analysis is running. Many functions reference these variables and passing them through argument lists is a nightmare.
I'd be intereseted to know if other parallel toolbox users would find this useful. I'd also be interested to know if Mathworks would consider implementing this.
Thanks, Dan

  0 Comments

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 17 Jul 2012
There is a FEX contribution for shared matrices on a single host. It does, however, only share a single matrix at a time, which could get clumsy for your purposes.
Extending shared memory to DCS would present difficulties. One worker per host would have to be nominated to create the shared matrix on that host, and the "last" worker (or that initial worker) would have to be responsible for removing it; this would require coordination between the workers that is not currently present. Especially as new copies of the worker might get scheduled on the host before all of the "first round" had disappeared.
The implementation could possibly take advantage of the fact that ipcrm will not actually delete a shared memory segment until the last detach. Well, on Linux and OS-X at least; I've never been convinced that MS Windows really implements POSIX to that level.
I suspect the matter might present problems in MS Windows on 32 bit systems, problems attaching the segment to the exact same virtual address in each process. The same virtual address needs to be used because the memory pointers that MATLAB uses internally are absolute addresses (absolute relative to the virtual address space of the process, as opposed to being segment + offset style that can be relatively easily relocated.)
As best I can tell, MATLAB is not designed to be able to work with memory pools internally, not designed to be able to bundle symbol tables and descriptors of variables and actual memory blocks, all within a restricted memory range. The creation of variables entirely within a memory segment for later sharing could possibly take a lot of internal work... I don't know.

  2 Comments

Dan
Dan on 17 Jul 2012
Thanks for quick reply. I did this type of thing under VMS years ago, so I know it can be done with virtual memory system. Management of the distributed global will be a bit dicey as you describe it.
Walter Roberson
Walter Roberson on 17 Jul 2012
Is it possible to write programs that can share memory effectively? Yes. Is MATLAB written to handle it? I don't think so.

Sign in to comment.


Mark
Mark on 19 Apr 2013
I'm quite late to the party on this one, but it seems that this is worth a reread and repost. I think many of us are finding that the need for managing the data we are working with is not given enough focus. Specifically, we throw around terms like "Big Data" yet have few tools to intelligently data-mine, manipulate or transform same.
If you consider one of the fundamental functions of MatLab and the family of similar solutions, embarrassingly parallel and / or distributed jobs are often parametric sweeps. I would even assert that the majority of jobs are.
Some component of the classic parametric sweep process is most certainly static, and likely growing exponentially in size as our Big Data notion continues to evolve. In the financial services space this borders often on unwieldy. It was trivial to throw a dataset (matrix or otherwise) around when its size was simply N a decade ago, but we now finds ourselves at N^t which I posit is no small issue.
I would appeal to those that understand this paradigm within Mathworks to consider it a high priority need that we provide a mechanism to declare and access a shared memory space of static global constants over the course of a job or jobs at the node level. If we need to invoke a process at onset and closing of a job process as overhead in perhaps an example of a distributed job, this seems a trivial price to pay in comparison of moving large datasets.

  0 Comments

Sign in to comment.