Segmentation Faults and Thread-Safety in Parfor Loops: Part II

I am currently running multiple repetitions of an experiment that uses MEX files and the CPLEX 12.4 API in MATLAB 2012a. Although the code for my experiment runs perfectly, I receive segmentation faults when I run it in parallel using the parfor function in MATLAB (< http://www.mathworks.com/matlabcentral/answers/35754-segmentation-faults-when-running-mex-files-in-parallel see former post>). I have recently found out that this is due to the putenv() function in the C library, which is used by one of the functions in the CPLEX 12.4 API.
After posting about this issue on the CPLEX forums, include < pthreads.h > in my MEX file and wrapping it inside a "pthread_mutex_lock." I have done this by changing:
void mexFunction(...) {
CPXENVptr env = NULL;
int status = 0;
env = myNonThreadSafeFunction (&status);
//... more code
}
to
//declare a pthread_mutex_t before the MEX function
static pthread_mutex_t myLock = PTHREAD_MUTEX_INITIALIZER;
void mexFunction(...) {
int status = 1;
CPXENVptr env = NULL;
//wrapped non-thread-safe function with mutex_lock
pthread_mutex_lock(&myLock);
env = myNonThreadSafeFunction (&status)
pthread_mutex_unlock(&myLock);
//... more code
}
Note that I also the pthread_mutex_t as a static variable in a standalone MEX file before the parfor loop - this is to ensure that it is visible to all the workers inside the parfor loop.
Despite this fix, I still receive segmentation faults and have some questions:
1. Should the approach outlined above work?
2. Some people have suggested that mxMalloc and mxFree are not thread-safe. If so, would it help to wrap this with the pthread as well?
3. Is there is any explanation for why I receive these faults when running the experiments in parallel using a Linux cluster, but when I run the experiments in parallel using Mac OSX 10.7?
4.Something else I can't understand. Before implementing the pthreads fix, the segmentation fault would cause one of the workers in my parfor loop to stall (as the parfor loop would just "wait" to hear back from the seg'd worker, which would never happen). Now, the segmentation fault actually shuts down the parfor loop and I get the following error message
Error message: The session that parfor is using has shut down
in line: 47
of file: home/software/matlab/matlab-2012a/toolbox/matlab/lang/parallel_function.m

3 Comments

Also here is a segmentation fault stack trace, fresh from the oven:
------------------------------------------------------------------------
Segmentation violation detected at Mon Apr 23 13:53:12 2012
------------------------------------------------------------------------
Configuration:
Crash Decoding : Disabled
Current Visual : None
Default Encoding: UTF-8
GNU C Library : 2.14.90 development
MATLAB Root : /home/software/matlab/matlab-2012a
MATLAB Version : 7.14.0.739 (R2012a)
Operating System: Linux 3.3.1-3.fc16.x86_64 #1 SMP Wed Apr 4 18:08:51 UTC 2012 x86_64
Processor ID : x86 Family 6 Model 26 Stepping 4, GenuineIntel
Virtual Machine : Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode
Window System : No active display
Fault Count: 1
Abnormal termination:
Segmentation violation
Register State (from fault):
RAX = 0000000000000010 RBX = 0000000000000011
RCX = 0000000000000030 RDX = 0000000000000011
RSP = 00007f6ef3468f28 RBP = 00007f6ef3468fa0
RSI = 00007f6ef3468fb0 RDI = 00007f6e0ecb8990
R8 = 000000000000ffff R9 = 0000000000000060
R10 = 0000003b46489e20 R11 = 0000000000000011
R12 = 00007f6ef3468fb0 R13 = 00007f6e0ecb8990
R14 = 0000000000000050 R15 = 0000003b467b4598
RIP = 0000003b46488acc EFL = 0000000000010283
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x00007f6f0b89f92e /home/software/matlab/matlab-2012a/bin/glnxa64/libmwfl.so+00370990 _ZN2fl4diag15stacktrace_base7captureERKNS0_14thread_contextEm+000158
[ 1] 0x00007f6f0b8a27d0 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwfl.so+00382928
[ 2] 0x00007f6f0b8a2b3b /home/software/matlab/matlab-2012a/bin/glnxa64/libmwfl.so+00383803 _ZN2fl4diag13terminate_logEPKcRKNS0_14thread_contextE+000171
[ 3] 0x00007f6f0a784203 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01253891 _ZN2fl4diag13terminate_logEPKcPK8ucontext+000067
[ 4] 0x00007f6f0a7810fd /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01241341
[ 5] 0x00007f6f0a78279d /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01247133
[ 6] 0x00007f6f0a782925 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01247525
[ 7] 0x00007f6f0a782f01 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01249025
[ 8] 0x00007f6f0a7833f5 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01250293
[ 9] 0x0000003b4700f500 /lib64/libpthread.so.0+00062720
[ 10] 0x0000003b46488acc /lib64/libc.so.6+00559820
[ 11] 0x0000003b4643915a /lib64/libc.so.6+00233818
[ 12] 0x0000003b4643903d /lib64/libc.so.6+00233533 putenv+000109
[ 13] 0x00007f6e450a1676 /home/software/cplex/cplex-12.4/cplex/matlab/cplexlink124.mexa64+02860662
[ 14] 0x00007f6e44ead434 /home/software/cplex/cplex-12.4/cplex/matlab/cplexlink124.mexa64+00812084
[ 15] 0x00007f6e44eacaf4 /home/software/cplex/cplex-12.4/cplex/matlab/cplexlink124.mexa64+00809716
[ 16] 0x00007f6e44eac30c /home/software/cplex/cplex-12.4/cplex/matlab/cplexlink124.mexa64+00807692
[ 17] 0x00007f6e44e65c09 /home/software/cplex/cplex-12.4/cplex/matlab/cplexlink124.mexa64+00519177
[ 18] 0x00007f6e44e65146 /home/software/cplex/cplex-12.4/cplex/matlab/cplexlink124.mexa64+00516422 mexFunction+000448
[ 19] 0x00007f6f049bccca /home/software/matlab/matlab-2012a/bin/glnxa64/libmex.so+00109770 mexRunMexFile+000090
[ 20] 0x00007f6f049b8f79 /home/software/matlab/matlab-2012a/bin/glnxa64/libmex.so+00094073
[ 21] 0x00007f6f049b9de1 /home/software/matlab/matlab-2012a/bin/glnxa64/libmex.so+00097761
[ 22] 0x00007f6f0a42c063 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_dispatcher.so+00479331 _ZN8Mfh_file11dispatch_fhEiPP11mxArray_tagiS2_+000515
[ 23] 0x00007f6f09cf2476 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01987702
[ 24] 0x00007f6f09ca3426 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01664038
[ 25] 0x00007f6f09ca7be4 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01682404
[ 26] 0x00007f6f09ca4333 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01667891
[ 27] 0x00007f6f09ca5037 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01671223
[ 28] 0x00007f6f09d0e690 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+02102928
[ 29] 0x00007f6f0a42c063 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_dispatcher.so+00479331 _ZN8Mfh_file11dispatch_fhEiPP11mxArray_tagiS2_+000515
[ 30] 0x00007f6f035d9a6f /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01522287
[ 31] 0x00007f6f03575124 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01110308
[ 32] 0x00007f6f03575bab /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01113003
[ 33] 0x00007f6f03575ebe /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01113790
[ 34] 0x00007f6f0357718d /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01118605
[ 35] 0x00007f6f035772ad /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01118893
[ 36] 0x00007f6f0357754c /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01119564 _Z27omConstructObjectWithClientN4mcos9COSNameIDEiPPK11mxArray_tagPKNS_9COSClientE+000476
[ 37] 0x00007f6f035e21bd /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01556925
[ 38] 0x00007f6f0365c563 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+02057571
[ 39] 0x00007f6f0a3e66a1 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_dispatcher.so+00194209 _ZN13Mfh_MATLAB_fn11dispatch_fhEiPP11mxArray_tagiS2_+000481
[ 40] 0x00007f6f09cf2476 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01987702
[ 41] 0x00007f6f09ca3426 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01664038
[ 42] 0x00007f6f09ca7be4 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01682404
[ 43] 0x00007f6f09ca4333 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01667891
[ 44] 0x00007f6f09ca5037 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01671223
[ 45] 0x00007f6f09d0e690 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+02102928
[ 46] 0x00007f6f0a42c063 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_dispatcher.so+00479331 _ZN8Mfh_file11dispatch_fhEiPP11mxArray_tagiS2_+000515
[ 47] 0x00007f6f035d4388 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01500040
[ 48] 0x00007f6f03575c22 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01113122
[ 49] 0x00007f6f03575ebe /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01113790
[ 50] 0x00007f6f03577c7c /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01121404
[ 51] 0x00007f6f035c0236 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01417782 _Z19omCallLoadobjMethodRN4mcos15COSInterfacePtrENS_8COSValueENS_14COSDataTypePtrE+000694
[ 52] 0x00007f6f035c0937 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01419575
[ 53] 0x00007f6f034fa676 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+00607862
[ 54] 0x00007f6f035b20eb /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01360107
[ 55] 0x00007f6f035a3f41 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01302337
[ 56] 0x00007f6f035a43fe /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01303550
[ 57] 0x00007f6f035a53f8 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcos.so+01307640
[ 58] 0x00007f6f0b5f1654 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00419412
[ 59] 0x00007f6f0b5f357a /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00427386
[ 60] 0x00007f6f0b5f3017 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00426007
[ 61] 0x00007f6f0b5f2b18 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00424728
[ 62] 0x00007f6f0b5f3017 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00426007
[ 63] 0x00007f6f0b5f3017 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00426007
[ 64] 0x00007f6f0b5f3017 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00426007
[ 65] 0x00007f6f0b5f3017 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00426007
[ 66] 0x00007f6f0b5f3a23 /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00428579 miGetCurrentItem+000387
[ 67] 0x00007f6f0b5d61db /home/software/matlab/matlab-2012a/bin/glnxa64/libmx.so+00307675 mxDeserializeWithTag+000315
[ 68] 0x00007f6e4daa01e0 /home/software/matlab/matlab-2012a/toolbox/distcomp/distcomp/distcompdeserialize.mexa64+00008672 mexFunction+000080
[ 69] 0x00007f6f049bccca /home/software/matlab/matlab-2012a/bin/glnxa64/libmex.so+00109770 mexRunMexFile+000090
[ 70] 0x00007f6f049b8f79 /home/software/matlab/matlab-2012a/bin/glnxa64/libmex.so+00094073
[ 71] 0x00007f6f049b9de1 /home/software/matlab/matlab-2012a/bin/glnxa64/libmex.so+00097761
[ 72] 0x00007f6f0a42c063 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_dispatcher.so+00479331 _ZN8Mfh_file11dispatch_fhEiPP11mxArray_tagiS2_+000515
[ 73] 0x00007f6f09cf2476 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01987702
[ 74] 0x00007f6f09ca3426 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01664038
[ 75] 0x00007f6f09ca7be4 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01682404
[ 76] 0x00007f6f09ca4333 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01667891
[ 77] 0x00007f6f09ca5037 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01671223
[ 78] 0x00007f6f09d0e690 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+02102928
[ 79] 0x00007f6f0a42c063 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_dispatcher.so+00479331 _ZN8Mfh_file11dispatch_fhEiPP11mxArray_tagiS2_+000515
[ 80] 0x00007f6f09cd165f /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01853023
[ 81] 0x00007f6f09cd0d49 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01850697
[ 82] 0x00007f6f09c322ec /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01200876 inCallFcnWithTrap+000092
[ 83] 0x00007f6f09c97b6b /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01616747
[ 84] 0x00007f6f09c31438 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwm_interpreter.so+01197112 _Z28inCallFcnWithTrapInDesiredWSiPP11mxArray_tagiS1_PKcbP15inWorkSpace_tag+000104
[ 85] 0x00007f6f05025d79 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwiqm.so+02485625 _ZN3iqm15BaseFEvalPlugin7executeEP15inWorkSpace_tagRN5boost10shared_ptrIN14cmddistributor17IIPCompletedEventEEE+000457
[ 86] 0x00007f6ec6ffe09d /home/software/matlab/matlab-2012a/bin/glnxa64/libnativejmi.so+00696477 _ZN9nativejmi14JmiFEvalPlugin7executeEP15inWorkSpace_tagRN5boost10shared_ptrIN14cmddistributor17IIPCompletedEventEEE+000173
[ 87] 0x00007f6ec70306e5 /home/software/matlab/matlab-2012a/bin/glnxa64/libnativejmi.so+00902885 _ZN3mcr3mvm27McrSwappingIqmPluginAdapterIN9nativejmi14JmiFEvalPluginEE7executeEP15inWorkSpace_tagRN5boost10shared_ptrIN14cmddistributor17IIPCompletedEventEEE+000629
[ 88] 0x00007f6f0500e482 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwiqm.so+02389122
[ 89] 0x00007f6f04fff264 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwiqm.so+02327140
[ 90] 0x00007f6f0a9f0afc /home/software/matlab/matlab-2012a/bin/glnxa64/libmwbridge.so+00125692 _Z10ioReadLinebP8_IO_FILEPcS1_iPbRKN5boost8optionalIKP15inWorkSpace_tagEEb+000508
[ 91] 0x00007f6f0a9f1165 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwbridge.so+00127333
[ 92] 0x00007f6f0a9f5d0a /home/software/matlab/matlab-2012a/bin/glnxa64/libmwbridge.so+00146698
[ 93] 0x00007f6f0a9f6165 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwbridge.so+00147813
[ 94] 0x00007f6f0a9f69ce /home/software/matlab/matlab-2012a/bin/glnxa64/libmwbridge.so+00149966 mnParser+000702
[ 95] 0x00007f6f0a768de2 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01142242 _ZN11mcrInstance30mnParser_on_interpreter_threadEv+000034
[ 96] 0x00007f6f0a74b51a /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01021210
[ 97] 0x00007f6f0a74b598 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01021336
[ 98] 0x00007f6f0af7e053 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwservices.so+00823379
[ 99] 0x00007f6f0af7e315 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwservices.so+00824085
[100] 0x00007f6f0af7cbc1 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwservices.so+00818113
[101] 0x00007f6f010e6605 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwuix.so+00505349
[102] 0x00007f6f0b0029a1 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwservices.so+01366433 _ZSt8for_eachIN9__gnu_cxx17__normal_iteratorIPN5boost8weak_ptrIN4sysq10ws_ppeHookEEESt6vectorIS6_SaIS6_EEEENS4_8during_FIS6_NS2_10shared_ptrIS5_EEEEET0_T_SH_SG_+000081
[103] 0x00007f6f0b003aab /home/software/matlab/matlab-2012a/bin/glnxa64/libmwservices.so+01370795
[104] 0x00007f6f0b0015f9 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwservices.so+01361401 _Z25svWS_ProcessPendingEventsiib+000665
[105] 0x00007f6f0a74a76f /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01017711
[106] 0x00007f6f0a74ac3b /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01018939
[107] 0x00007f6f0a74ad97 /home/software/matlab/matlab-2012a/bin/glnxa64/libmwmcr.so+01019287
[108] 0x0000003b47007d90 /lib64/libpthread.so.0+00032144
[109] 0x0000003b464f0f5d /lib64/libc.so.6+00986973 clone+000109
This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.
If this problem is reproducible, please submit a Service Request via:
http://www.mathworks.com/support/contact_us/
A technical support engineer might contact you with further information.
Thank you for your help.
If it were me, I'd take the advice of the last 4 lines of that.
Hi Berk, Were you able to figure out the fix for this problem. I am also facing a similar issue with a 3rd party mex file which runs perfectly in a for loop but gives segmentation fault in parfor. Also I am not able to replicate the segmentation fault with specific inputs that I can share with the mex files authors i.e. it is unpredictable. Any update would be much appreciated!

Sign in to comment.

Answers (1)

Please note that 'thread safety' really shouldn't be the major concern here. When you open a MATLABPOOL, you are starting several completely separate MATLAB worker processes. In fact, the worker processes are started with the '-singleCompThread' option so that there's only 1 computational thread.
In particular, you may need to consider any setup you're doing - you need to do this afresh on each worker process. Perhaps like this
matlabpool open ...
spmd
setupForMyMexStuff; % runs once per worker process
end
% Now use the mex stuff:
parfor ii=...
useMyMexStuff ...
end
From this and your previous posts, it looks like you're reproducibly hitting the SEGV in the call to 'putenv'. Perhaps you're not setting up the environment variables in some way that your Mex file is expecting.

Categories

Asked:

on 23 Apr 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!