Poor performance in mex-files created using mexcuda

Question

0 votes

I've noticed that the matlab gpu enabled code becomes VERY slow suddenly when a seemingly innocent line is added. I therefor wan't to call cuda using mex. In particular I like to call cufft. However when compiling a mexfile and calling cufft I get very! poor performance compared to matlabs fft (which seem to use cufft). What is going on? The cufft part of the mexfile seem to run 50-200 times slower than matlab fft calling the same library. Is mexcuda optimized? Here is the code I've used. Can of course (most likely be some error I've made):

% Matlab side code
% Compile using:
% >> mexcuda -L"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\x64" -lcufft abc.cu
A = gpuArray.randn(600,600,32,'single') + 1i*randn(600,600,32,'single');
tic
B = abc(A);
toc;
tic, for ii = 1:30, B = fft2(A); end; toc
AA = gather(A);
tic, for ii = 1:30, B = fft2(AA); end; toc
%%Output from a run
>> testabc
Elapsed time is 0.193155 seconds. % Mex file
Elapsed time is 0.004172 seconds. % Matlab fft2
Elapsed time is 1.455618 seconds. % Matlab CPU
// Mex-file code in the file abc.cu
#include "mex.h"
#include "gpu/mxGPUArray.h"
#include <cufft.h>
// Interal type for complex. Same as cufftComplex just another name
typedef float2 Complex;
/*
* Device code
*/
void mexFunction(int nlhs, mxArray *plhs[],
int nrhs, mxArray const *prhs[])
{
char const * const errId = "parallel:gpu:mexGPUExample:InvalidInput";
char const * const errMsg = "Invalid input to MEX file.";
/* Declare all variables.*/
mxGPUArray const *A;
mxGPUArray *B;
Complex const *pA;
Complex *pB;
/* Initialize the MathWorks GPU API. */
mxInitGPU();
/* Throw an error if the input is not a GPU array. */
if (nrhs!=1) {
mexErrMsgIdAndTxt(errId, errMsg);
}
for (int ii = 0; ii<1; ii++)
if (!(mxIsGPUArray(prhs[ii])))
mexErrMsgIdAndTxt(errId, errMsg);
A = mxGPUCreateFromMxArray(prhs[0]);
// Verify that input is single arrays before extracting the pointer.
if (mxGPUGetClassID(A) != mxSINGLE_CLASS ) 
{
mexErrMsgIdAndTxt(errId, errMsg);
}
/* Get the pointer to the data */
pA = (Complex const *)(mxGPUGetDataReadOnly(A));
/* Create a GPUArray to hold the result and get its underlying pointer. */
B = mxGPUCreateGPUArray(mxGPUGetNumberOfDimensions(A),
mxGPUGetDimensions(A),
mxGPUGetClassID(A),
mxGPUGetComplexity(A),
MX_GPU_DO_NOT_INITIALIZE);
pB = (Complex *)(mxGPUGetData(B));
// Now we can do work! 
mwSize const * dimSize = mxGPUGetDimensions(A);
// FFT test
cufftHandle plan;
int dd[2];
dd[1] = (int) dimSize[1];
dd[0] = (int) dimSize[0];
int Nq = (int) dimSize[2];
int L = 30;
cufftPlanMany(&plan, 2, dd, NULL,0,0,NULL,0,0,CUFFT_C2C,Nq);
for (int i = 0; i<L; i++)
{
// Do the fft
cufftExecC2C(plan,(cufftComplex *) pA,(cufftComplex *) pB,CUFFT_FORWARD);
}
/* Wrap the result up as a MATLAB gpuArray for return. */
plhs[0] = mxGPUCreateMxArrayOnGPU(B);
// Free resources
cufftDestroy(plan);
mxGPUDestroyGPUArray(A);
mxGPUDestroyGPUArray(B);
}

1 Comment
Show -1 older comments Hide -1 older comments

Edric Ellis on 8 Jun 2016

I suggest using gputimeit to time code running on the GPU, as this takes account of the asynchronous nature of GPU calls.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Joss Knight on 8 Jun 2016

0 votes

I implemented your code and got the same results as you. Then I added gpu = gpuDevice at the top and wait(gpu) before each call to tic and toc, and found that your code was slightly faster. So I think it's just to do with the way you're timing.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Poor performance in mex-files created using mexcuda

1 Comment
Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Poor performance in mex-files created using mexcuda

1 Comment Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments