How to use RandStreams appropriately with Parallel Computing?

21 views (last 30 days)
I am currently working to update an existing set of code for reproducibility.
Currently, the code is structured as follows:
nlabs = 6;
seed = 1; % User-choice
[globalstream, labstreams{1:nlabs}] = RandStream.create('mrg32k3a','NumStreams',nlabs+1,'Seed',seed);
RandStream.setGlobalStream( globalstream );
parallelpool=parpool(nlabs);
spmd
RandStream.setGlobalStream( labstreams{spmdIndex} );
end
parfor i=1:nlabs
Calculations here
end
However, I need the code to be fully reproducible. I understand that to achieve reproducibility with parallel computing I need to use substreams ( https://www.mathworks.com/help/stats/reproducibility-in-parallel-statistical-computations.html ). However I am not confident of how to distinguish the global stream and worker stream.
I've seen an example in which the user used only a single global stream by storing and retreiving the stream state before and after the parfor loop ( https://www.mathworks.com/matlabcentral/answers/1670009-reproducible-and-independent-random-stream-generation-in-parfor-loop ) but it seems like it would be simpler to setup two independent streams.
I've outlined a two-stream setup below. Does this seem reasonable? I want globalstream and each substream of labstream to be independent.
nlabs = 6;
seed = 1; % User-choice
[globalstream, labstream] = RandStream.create('mrg32k3a','NumStreams',2,'Seed',seed);
RandStream.setGlobalStream( globalstream );
<Some Calculations>
parallelpool=parpool(nlabs);
parallel.pool.Constant(RandStream.setGlobalStream(labstream)) % Not sure of the syntax here
parfor i=1:nlabs
set(labstream,'Substream',i)
<Some Calculations>
end
RandStream.setGlobalStream( globalstream );
<Some Calculations>

Answers (1)

Matt
Matt 5 minutes ago
To achieve full reproducibility in parallel MATLAB code, it is essential to separate client-side (global) random number generation from worker-side random number generation, and to ensure that no RandStream object is shared across workers. While substreams are the correct mechanism for reproducible parallel execution, a single stream cannot be safely mutated inside a parfor loop, as execution order is undefined and leads to non-deterministic results.
Use one stream on the client for all serial computations. This stream is independent of any parallel execution.
seed = 1;
clientStream = RandStream('mrg32k3a','Seed',seed);
RandStream.setGlobalStream(clientStream);
% Client-side computations
A = rand(1,10);
Each worker must have its own stream instance. This is done using parallel.pool.Constant, which constructs a separate RandStream on each worker with the same seed.
nlabs = 6;
parpool(nlabs);
workerStreams = parallel.pool.Constant(@() ...
RandStream('mrg32k3a','Seed',seed));
This avoids sharing stream handles and guarantees deterministic initialization on every run.
Inside the parfor loop, assign a substream based on the loop index and set it as the worker’s global stream before generating random numbers.
parfor i = 1:nlabs
s = workerStreams.Value;
s.Substream = i; % Deterministic mapping
RandStream.setGlobalStream(s);
% Parallel computations
x = rand(1,5);
end
With mrg32k3a, substreams are independent and ordered, so results are reproducible regardless of scheduling or execution order.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!