parpool() stalls on Xeon Phi x200 with >50 workers

12 views (last 30 days)
mph on 18 May 2018
Commented: mph on 22 May 2018
I am evaluating parpool() on my new Intel Xeon Phi "Knights Landing" 7210. I find that parpool('local',NumWorkers) successfully creates a pool for NumWorkers<51, but it stalls and fails for any number equal to or greater than 51.
My system: 64 physical cores | 265 logical cores | 6x16GB memory | OS = CentOS linux | Matlab version R2018a
Attempted solutions: (1) changed java heap size between 512MB and 8192MB; (2) set java ThreadStackSize via $MATLAB/bin/glnxa64/java.opts (tried -XX:ThreadStackSize=8192 and 16384); (3) distcomp.feature( 'LocalUseMpiexec', false );
Each worker created by parpool takes about 0.5GB (according to top), such that plenty of system memory is left. Java memory resources also seem not to be depleted.
Here is a test I ran:
%%parpool() test
distcomp.feature( 'LocalUseMpiexec', false )
JavaRuntimeSettings =
[~,freeSystemMemory]=system('vmstat -s -S M | grep "free memory"')
rJavaObj = java.lang.Runtime.getRuntime;
freeMemory = rJavaObj.freeMemory
totalMemory = rJavaObj.totalMemory
maxMemory = rJavaObj.maxMemory
for NumberOfWorkers = [50, 51]
pool = parpool('local',NumberOfWorkers)
TimeElapsed = toc
[~,freeSystemMemory]=system('vmstat -s -S M | grep "free memory"')
rJavaObj = java.lang.Runtime.getRuntime;
freeMemory = rJavaObj.freeMemory
totalMemory = rJavaObj.totalMemory
maxMemory = rJavaObj.maxMemory
And here is the output I get:
ans =
JavaRuntimeSettings =
[-Xms64m, -XX:NewRatio=3, -Xmx2048m, -XX:MaxDirectMemorySize=2147400000, -XX:+AllowUserSignalHandlers, -Xrs, -XX:ThreadStackSize=16384, -Djava.library.path=/usr/local/MATLAB/R2018a/bin/glnxa64:/usr/local/MATLAB/R2018a/sys/jxbrowser/glnxa64/lib, vfprintf, -XX:ErrorFile=/home/mph/hs_error_pid38489.log, abort, -Duser.language=en,, -Dfile.encoding=UTF-8, -XX:ParallelGCThreads=6]
freeSystemMemory =
' 85393 M free memory
freeMemory =
totalMemory =
maxMemory =
Starting parallel pool (parpool) using the 'local' profile ...
connected to 50 workers.
pool =
Pool with properties:
Connected: true
NumWorkers: 50
Cluster: local
AttachedFiles: {}
AutoAddClientPath: true
IdleTimeout: 3 minutes (3 minutes remaining)
SpmdEnabled: true
TimeElapsed =
freeSystemMemory =
' 65170 M free memory
freeMemory =
totalMemory =
maxMemory =
Parallel pool using the 'local' profile is shutting down.
Starting parallel pool (parpool) using the 'local' profile ...
connected to 51 workers.
At that point it stalls and I never get the prompt back. Using the top command in the linux terminal I can see plenty of idle Matlab workers.
When I terminate the process (Ctr+c) within Matlab I get the following:
Operation terminated by user during parallel.internal.queue.JavaBackedFuture/waitScalar (line 211)
In parallel.Future>@(o)waitScalar(o,predicate,waitGranularity,deadline)
In parallel.Future/wait (line 292)
ret = all(arrayfun(@(o) waitScalar(o, predicate, waitGranularity, deadline), ...
In parallel.Future/fetchOutputsImpl (line 574)
In parallel.Future/fetchOutputs (line 341)
varargout = fetchOutputsImpl(F(:), nargout, varargin{:});
In parallel.Pool>iPostLaunchSetup (line 674)
mapping = fetchOutputs(parfevalOnAll(pool, @iGetMachineToWorkerMappingAndUnfreezePaths, 1, ...
In parallel.Pool.hBuildPool (line 588)
iPostLaunchSetup(aPool, client.ParallelJob.AdditionalPaths);
In parallel.internal.pool.doParpool (line 18)
pool = parallel.Pool.hBuildPool(constructorArgs{:});
In parpool (line 98)
pool = parallel.internal.pool.doParpool(varargin{:});
In partictoc (line 12)
pool = parpool('local',NumberOfWorkers)
So, what are these workers waiting for and why? How to make them do work?

Answers (1)

Sangeetha Jayaprakash
Sangeetha Jayaprakash on 21 May 2018
If you are referring to Xeon Phi host processors (as introduced with the Knights Landing architecture), they are compatible with the Parallel Computing Toolbox, as any other x86_64 processor with multiple cores. If you would like to use Xeon Phi coprocessors, they are not currently supported.
mph on 22 May 2018
[root@230-83 mph]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 4
Core(s) per socket: 64
Socket(s): 1
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 87
Model name: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
Stepping: 1
CPU MHz: 1297.968
CPU max MHz: 1500.0000
CPU min MHz: 1000.0000
BogoMIPS: 2600.09
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
NUMA node0 CPU(s): 0-255
NUMA node1 CPU(s):
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait epb fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm arat pln pts
And here is info on my Matlab install:
>> ver('distcomp')
MATLAB Version: (R2018a)
MATLAB License Number: 648372
Operating System: Linux 3.10.0-693.21.1.el7.x86_64 #1 SMP Wed Mar 7 19:03:37 UTC 2018 x86_64
Java Version: Java 1.8.0_144-b01 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
Parallel Computing Toolbox Version 6.12 (R2018a)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!