How do I update MEX-files to use the large array handling API (-largeArrayDims)?

Question

MathWorks Support Team on 10 Sep 2012

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/99144-how-do-i-update-mex-files-to-use-the-large-array-handling-api-largearraydims

Edited: MathWorks Support Team on 19 Nov 2024 at 3:59

The MEX API has changed to support MATLAB variables with more than 2^32-1 elements. This feature was added in MATLAB Version 7.3 (R2006b). You must compile your MEX-files with the -largeArrayDims flag to opt-in to this API.

You may need to update your MEX source code to utilize the new API. In the near future, the MEX build script will use the large-array-handling API by default.

Sign in to answer this question.

Answer 1

MathWorks Support Team on 19 Nov 2024 at 0:00

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/99144-how-do-i-update-mex-files-to-use-the-large-array-handling-api-largearraydims#answer_108491

Edited: MathWorks Support Team on 19 Nov 2024 at 3:59

Open in MATLAB Online

largeArrayDimsFaq.txt

The addition of the large array handling API may require you to update your MEX-files. The reason that code changes are required is that in order to enable large sized data, the types of inputs and outputs to the MATLAB API had to be changed. We have implemented the changes to our API in a way that reduces the impact on existing users. We adopted an "opt-in" strategy, where if you want to take advantage of the new large array handling feature, you can opt-in using the MEX flag "-largeArrayDims". If you chose to opt-in you will need to make the updates described in this solution. If you chose to not "opt-in", you do not need to do anything now. Beware, though, in the future the new large array handling API will become the default, and you will then need to update your code or take other action.

This solution walks you through a step-by-step process for identifying the changes you need to make. The procedure suggests that you first identify and list variables that are candidates for changing before modifying your code. We recommend that you don't edit your code until you have completed the identification steps.

First, consider high level technical explanation for the changes that need to be made. With the change of types used by the MEX API, there is a chance that your existing code passes inputs and output to MATLAB with the wrong types. Specifically, the types of parameters that changed are indices and sizes. Both have changed from 32-bit types to 64-bit compatible types.

The following procedures will help you identify the type mismatches that need to be corrected. It follows a methodical process which includes first testing the original code, then identifying changes, then updating a copy of the code, building the new code, and finally retesting. There are suggestions along the way for how to deal with common code patterns, build failures and warnings, and runtime issues.

Note: This document uses primarily C/C++ terminology and example code. Fortran MEX-files share the same issues; you can find additional Fortran specifics in Section 8.8.

In more detail:

1. Test existing code

Before adapting your code to handle large arrays, you should first verify that it works with the traditional 32-bit array dimensions. You may want to build a list of expected inputs and outputs, or even a full test suite. Back up your source code and binaries for safekeeping and so that you can compare the results with your updated source code.

2. Identify variables that contain 64-bit index or size values

In order to handle very large arrays, you will need to convert all variables that contain array indices or sizes to use mwSize / mwIndex data types instead of 32-bit "int". These data types are 64-bit integers when using large array dimensions on 64-bit architectures, and are implemented as preprocessor macros. When using -largeArrayDims, both of these types are the same as the size_t type in C/C++.

We suggest looking at three classes of variables: those used directly by the API functions, intermediate variables, and variables that are used as both sizes / indices and 32-bit integers.

2.1. Search for variables used directly by the 64-bit API functions

To identify these variables, look for mx* and mex* functions that take mwSize/mwIndex values as input or output values.

For an accurate list, check the documentation in your version of MATLAB. You can find this list here:

http://www.mathworks.com/help/matlab/matlab_external/handling-large-mxarrays.html

You can find these functions by using your editor's Find function or a utility such as GREP.

For example, in older versions of MATLAB (R2006a and earlier), this signature was:

mxArray *mxCreateDoubleMatrix(int m, int n, mxComplexity ComplexFlag);

The signature for mxCreateDoubleMatrix in the large array dimension API is:

mxArray *mxCreateDoubleMatrix(mwSize m, mwSize n, mxComplexity ComplexFlag);

Note the first two input arguments have changed from "int" to "mwSize". Variables passed as the first or second inputs must be declared as mwSize.

You can find the new signature on each function's reference page, reachable through the Help Browser or the DOC command:

doc mxCreateDoubleMatrix

Because mwSize is the same as "int" when using 32-bit array dimensions, your code may still be written to this older signature. In order to take advantage of large array handling, you must update your code to use mwSize/mwIndex to allow for 64-bit size and index values.

Search through your code, and note any variables defined as "int" or similar type. Do not make any edits at this point, simply note which variables need updating.

If your code is:

   int m,n; 
   ...
   YP_OUT = mxCreateDoubleMatrix(m, n, mxREAL); 

both m and n should be declared mwSize:

   mwSize m,n; 
   ...
   YP_OUT = mxCreateDoubleMatrix(m, n, mxREAL); 

2.2. Find intermediate variables

It is possible that your code uses intermediate variables to calculate sizes and indices. If that is the case, you will need to ensure that those variables are also declared as the appropriate type. Consider this example:

   mwSize m,n;
   int numDataPoints;
   m = 3;
   numDataPoints = m * 2;
   n = numDataPoints + 1;
   ...
    YP_OUT = mxCreateDoubleMatrix(m, n, mxREAL); 

The first search in Section 2.1 will miss intermediate variables that are not directly used in mx* function calls, such as numDataPoints in this example. These variables will not be large enough to handle 64-bit indices coming from mwSize variables. Storing 64-bit indices in these 32-bit intermediate variables will truncate the indices, leading to incorrect results or crashes. In this example, numDataPoints should be changed to be mwSize, since "m" is mwSize:

   mwSize m,n;
   mwSize numDataPoints;

Inspect your code, and add all such intermediates to the list of variables to convert.

Again, do not make any edits at this point.

2.3. Replace variables that serve multiple purposes

It is possible that your code uses the same variable for both indices (which we are converting to 64-bit) and STRUCT field numbers or status codes (both of which stay 32-bit).

Variables that are used for multiple purposes need to be identified, and replaced with two variables --- one mwSize / mwIndex and one 32-bit integer. This situation is especially common when using one variable for both array size and number of STRUCT fields. Another case is when status codes / success flags are also used as indices.

For example, mxCreateDoubleMatrix expects mwSize inputs, but mxCreateStructMatrix requires an int:

mxArray *mxCreateDoubleMatrix(mwSize m, mwSize n, mxComplexity ComplexFlag);
mxArray *mxCreateStructMatrix(mwSize m, mwSize n, int nfields, const char **fieldnames);

Now consider the numDataPoints variable:

   mxArray *myNumeric, *myStruct;
   int numSensors;
   mwSize m, n;
   char **fieldnames;
   int numFields;
   ...
   myNumeric = mxCreateDoubleMatrix(numSensors, n, mxREAL); 
   myStruct = mxCreateStructMatrix(m, n, numSensors, fieldnames); 

In this example, you need two new variables to replace numSensors to properly handle both functions:

   mxArray *myNumeric, *myStruct;
   mwSize numSensorSize;
   int numSensorFields;
   mwSize m, n;
   char **fieldnames;
   ...
   myNumeric = mxCreateDoubleMatrix(numSensorSize, n, mxREAL); 
   myStruct = mxCreateStructMatrix(m, n, numSensorFields, fieldnames); 

Note which variables will need to be replaced in this way. Again, do not make any edits at this point.

3. 3rd party libraries

Your MEX-file may involve portions of code that you did not write, and to which you do not have access. These may be entire MEX-files, numerical routines, or device drivers. There are several ways to approach this issue:

3.1. If you do not have access to the source code, contact the vendor. Refer them to this document and discuss options to handle large array dimensions with them.

3.2. If you share code with others, such as publicly available source code, check with the relevant author or user group to see if someone has converted the code to use large array dimensions. If not, you may want to convert the code yourself. Again, apply the suggestions in this document.

4. Create a working copy, make edits, and test with 32-bit dimensions.

At this point, you know which type declarations need to be edited. Make a copy of your source code and change the relevant declarations. Compile your code as usual:

mex myMexFile.c

This uses the traditional 32-bit dimensions. This is the same as using the -compatibleArrayDims flag:

mex -compatibleArrayDims myMexFile.c

Compare with your original binary file; both should return identical results.

If not, debug and resolve any differences. These will be easier to resolve now (using the same 32-bit sizes and indices) than in the next step.

5. Build with -largeArrayDims and resolve build failures & warnings.

You are now ready to compile your MEX-file using the large array handling API. Simply add the -largeArrayDims flag to your compilation; for example, instead of

mex myMexFile.c

use

mex -largeArrayDims myMexFile.c

When using -largeArrayDims, your compiler may refer to mwSize / mwIndex as "size_t", "unsigned __int64", or other similar names.

Most build problems at this point will be related to type mismatches between 32- and 64-bit types. Some common issues you may encounter are:

5.1. Type mismatch in assignment

In some instances --- especially simple assignments --- your C/C++ compiler can warn about type mismatches. For example:

   mwSize m,n;
   int numDataPoints;
   ...
   numDataPoints = m * 2;

When you compile this file using "mex -largeArrayDims buggyMexFile.c", you may receive a type-mismatch warning.

From Microsoft Visual Studio on 64-bit Windows:

ERROR: C:\Work\buggyMexFile.c(31) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data 

This example can be fixed using Section 2.2 to convert the intermediate variable numDataPoints to mwSize.

5.2. Type mismatch in pointer type

When using mxCreate*Array functions that take arrays of dimensions, you will need to convert the pointer from int* to mwSize*:

   mwSize ndim;
   int *myDims;
   ...
   pa1 = mxCreateLogicalArray(ndim, myDims);

From gcc on 64-bit Linux:

ERROR: buggyMexFile.c: In function 'mexFunction':
buggyMexFile.c:35: warning: passing argument 2 of 'mxCreateLogicalArray' from incompatible pointer type

In this case, myDims must be mwSize*, as described in Section 2.1.

6. Execute MEX-file, test, and debug

Compare the results of running your MEX-file compiled with -largeArrayDims with results from your original binary. If there are any differences or failures, use a debugger to investigate the cause.

Debuggers are key tool for investigating programming issues in your code. For more instructions on debugging MEX files, please use the links provided below:

Debugging Fortran Source MEX-files

Debugging C Language MEX-files on Windows Platforms

Debugging C Language MEX-files on Mac Platforms

Debugging C Language MEX-files on Linux Platforms

For more information on the capabilities of your debugger, refer to your compiler documentation.

After resolving any issues with the -largeArrayDims version, your converted MEX-file now replicates the functionality of your original code while using the large array handling API.

The following are a list of common issues you might encounter when running your MEX-files.

6.1. Out of Memory error

One potential error mode you might run into is the "Out of Memory" error. This is one of the most common failure modes when converting code to use large array dimensions. It is most often caused by the use of mismatching types with functions that allocate memory, such as mxCreateDoubleMatrix. The type mismatch corrupts the requested array size, causing MATLAB to try to allocate much more memory than you expect. This value can far exceed the available memory (RAM) of the machine you are using.

The function expects 8 bytes, so it grabs the 4 bytes adjacent to the ones passed in by the 4 byte variable. The extra bytes have random information in them and as a consequence, the function interprets the input as a very different value than what was passed in.

For example, mxCreateDoubleMatrix expects the first argument to be mwSize, which is 64 bits when using -largeArrayDims on a 64-bit machine. Suppose the variable passed as the first argument is an int, which is 32 bits:

   int m,n; 
   m = 10;
   ...
   YP_OUT = mxCreateDoubleMatrix(m, n, mxREAL); 

The value that mxCreateDoubleMatrix receives may contain 32 bits of random information. For example, the value could be interpreted as 85899345930 (10 + 20*2^32). This far exceeds the memory capacity of most machines, and consequently MATLAB reports an Out of Memory error:

ERROR: ??? Error using ==> buggyMexFile
Out of memory. Type HELP MEMORY for your options.

If you encounter this issue, change the type of the variable to mwSize or mwIndex as suggested in section 2.1:

   mwSize m,n; 
   m = 10;
   ...
   YP_OUT = mxCreateDoubleMatrix(m, n, mxREAL); 

This case is difficult to debug because "m" and "n" both contain the correct value, but are the wrong storage class. Your debugger can help identify this. For example, the Microsoft Visual Studio debugger displays a "Type" column that will show "int" for the variables above, and "size_t", "unsigned __int64", or similar names for the correct type.

6.2. Segmentation Violation, Assertion, or other crash

Use your debugger to step through both your original and modified code, looking for the first difference between the two, especially in any variables you modified. This root cause can happen well before the actual crash.

Again, your debugger can identify the data types for each of your values.

6.3. Different results

Use your debugger to step through both your original and modified code, looking for the first difference between the two, especially in any variables you modified. This root cause can happen well before returning results to MATLAB.

7. Call your MEX-file with gigantic arrays

If you have access to a machine with large amounts of memory, you can now experiment with arrays with more than 2^32-1 elements. Working with these arrays takes large amounts of memory --- an array of double-precision floating point numbers (the default in MATLAB) with 2^32 elements takes ~32 GB of memory.

For an example that demonstrates the use of large arrays, see the arraySize.c MEX file in Handling Large mxArrays in C MEX Files.

8. Frequently Asked Questions (answered in the attached largeArrayDimsFaq.txt):

8.1. What if I only use 32-bit MATLAB?

8.2. What if I already made the recommended changes to support sparse arrays on 64-bit systems?

8.3. What if I already made the recommended changes to support 64-bit mxArrays?

8.4. What if I already made the recommended changes to my Fortran source files?

8.5. What if I want to opt out (don't want to upgrade)?

8.6. What if I do nothing?

8.7. What if I don’t have source MEX-files?

8.8. What if I'm using Fortran?

8.9. What if I find deprecated functions such as mexPutFull in my code?

2 Comments
Show NoneHide None

Walter Roberson on 25 Oct 2018

The link web([docroot '/techdoc/matlab_external/f13120.html#brb25_6-1']) is obsolete.

Walter Roberson on 22 May 2021

The largeArrayDimsFaq.txt does not appear to be linked to any more, but it seems to exist at https://www.mathworks.com/matlabcentral/answers/uploaded_files/2073/largeArrayDimsFaq.txt

Sign in to comment.

How do I update MEX-files to use the large array handling API (-largeArrayDims)?

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How do I update MEX-files to use the large array handling API (-largeArrayDims)?

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

2 Comments
Show NoneHide None