Distributed arrays unevenly distributed

4 views (last 30 days)
Maria
Maria on 13 Oct 2021
Commented: Oli Tissot on 12 Nov 2021
Hi,
I have a remote cluster with 8 nodes, and each node has 16 GB of memory.
I am running an example with a big 3D matrix of size around 10000x 4500 x 8. I tried now to launch a batch job. The matrix is created directly in the function as distributed array, as
H_sym = zeros(m,m,LENGTH_BETA,'distributed')+1j*zeros(m,m,LENGTH_BETA,'distributed');
However, if I look at each node status (in Linux, with htop), I see that all cores of all nodes are working, and all nodes have 4 GB of memory occupied that does not change, all except the 1st node. The 1st node shows an allocation of memory that changes between 8GB and 13 GB.
Why is only the first node that has a larger occupation of memory, that changes over time? Shouldn't the "distributed" distribute the matrix in the same way among all nodes?
Best
Maria
  1 Comment
Oli Tissot
Oli Tissot on 12 Nov 2021
Hi Maria,
When distributed arrays are constructed, they are distributed as evenly as possible along the second dimension. In your case, it means 4500 is spread into 8 parts and some workers end up getting 10000x562x8 local parts whereas others are getting 10000x563x8 local parts. So not all workers are using the exact same amount of memory, but I believe that do not explain the discrepancy you're seeing. I suspect the computation you're doing afterwards on H_sym involves communication between workers, thus workers receiving messages use more memory. And that could explain what you are seeing. What computation are doing on H_sym after creating it?
Finally, the way you're building H_sym is correct but there is more efficient here:
H_sym = zeros(m,m,LENGTH_BETA,'like',distributed(1i));
Cheers,
Oli

Sign in to comment.

Answers (0)

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!