Matlab R2024b parallel pool not working above 32 cores

29 views (last 30 days)
Hi everyone,
I have a P8 ThinkStation with the AMD Ryzen ThreadRipper 7985WX working on W11 and Matlab R2024b installed. The processor has 64 physical cores and 128 logical ones. When I try to validate the local cluster profile for the parallel processing with a number of cores greater than 32, the validation fails at "SPMD job test" stage returning the following error:
Error Report: Job errored or did not reach the state 'finished'. MATLAB worker shut down unexpectedly with status -4 during task execution.
Indeed, the error status changes sometimes among -1, -2 and -4.
Any suggestion to fix this issues? I didn't have such a problem with R2023b...
Best regards,
Filippo

Answers (2)

sidik
sidik on 7 Nov 2024
try to follow this :
Step 1: Reduce the Number of Cores Used
  1. Open Matlab.
  2. Go to Home > Parallel > Manage Cluster Profiles.
  3. In the Cluster Profile Manager window, select local from the list of cluster profiles.
  4. Click on Edit at the bottom right.
  5. In the NumWorkers section, set the number to 32 (or a lower number if you want to test gradually).
  6. Click Done to save the changes.
  7. Close the Cluster Profile Manager window.
Step 2: Test the Cluster Profile
  1. Go back to Parallel and click on Validate.
  2. Let Matlab validate the cluster profile. If the test still fails, try decreasing NumWorkers (to 16 or 8) and validate again to see if a lower number of cores resolves the issue.
Step 3: Create a Custom Cluster Profile (if needed)
  1. If validation continues to fail, go back to Manage Cluster Profiles and click on New Profile.
  2. Name the new profile (e.g., CustomProfile).
  3. In NumWorkers, try a reasonable number (such as 16 or 24).
  4. Save by clicking Done.
  5. Set this new profile as the active profile by checking the box next to its name.
  6. Validate the profile by clicking on Validate.
if all the above steps fail, i suggest you to visit support and open a support ticket.
don't hesitate if you're still stuck

Filippo Ambrosino
Filippo Ambrosino on 7 Nov 2024
Hello @sidik
thanks for your help. I tried to decrease the number of cores and it works properly. Unfortunately I need to work with at least 64 cores otherwise I would spend a lot of time to run my routines.
Best regards,
Filippo
  8 Comments
Raffael Kozerski
Raffael Kozerski on 2 Jan 2025 at 20:09
Edited: Raffael Kozerski on 2 Jan 2025 at 20:11
Same here:
running a simulation with more than 60 workers crashed with R2024b on several machines.
The same simulation runs fine with R2024a using 700 cores/Matlab workers.
No idea why R2024b crashed; also running SPMD validation test.
in the Job log there is only a "Matlab crashed on worker XXX" message - no other useful information.
Raffael-

Sign in to comment.

Products


Release

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!