mexcuda compiler error: "__global__" does not apply here

7 views (last 30 days)
I'm writing a MEX CUDA function that performs the distance formula as: for each pixel of an image, where the pixel locations are in x and y respectively, and and contain the locations for an array of transducers. The function details aren't important for now, because I cannot compile when this kernel is in the code:
__global__ void distance_formula_index_units(uint32_t * delay, const double * x, const double * z, const double * x0, const double * z0, params * p) {
int x_px = threadIdx.x;
int y_px = threadIdx.y;
int sen = blockIdx.x;
int m_id = blockDim.x * blockDim.y + sen;
double x_dist = x[x_px] - x0[sen];
double y_dist = z[y_px] - z0[sen];
double distance = sqrt(x_dist*x_dist + y_dist*y_dist);
delay[m_id] = (uint32_t)(distance / p->c / p->dt);
}
Other explanations for context: delay is the output matrix of time delays, p is a typedef'd struct from earlier in the code that contains parameters about the image and system such as the wavespeed c and sample time resolution dt.
typedef struct params {
size_t n_xp;
size_t n_yp;
size_t n_sens;
double dt;
double c;
} params;
I run "mexcuda calculate_delays.cu", and instead of compiling, I get:
Error using mex
.\06_MEX_Functions\calculate_delays.cu(14): error: attribute "__global__" does not apply here
__declspec(__global__) void distance_formula_distance_units(uint32_t * delay, const double * xx, ...
^
I have only included this one function from the code, because it successfully compiles when I comment out this specific kernel, and the logic gate that activates it. There's another kernel in the code as well (distance_formula_time_units). The code that calls this function ALSO throws a different error:
if (*use_index_units) {
mxGPUArray * delaymatrix = mxGPUCreateGPUArray(3, output_dimensions, mxUINT32_CLASS, mxREAL, MX_GPU_DO_NOT_INITIALIZE);
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
distance_formula_index_units
<<<input_parameters->n_xp*input_parameters->n_yp, input_parameters->n_sens>>>
(delay_dvc_int, x_arr_dvc, y_arr_dvc, x0_dvc, z0_dvc, input_parameters);
cudaDeviceSynchronize();
// copy delay matrix from device back to host, set output
outputs[0] = mxGPUCreateMxArrayOnCPU(delaymatrix);
mxGPUDestroyGPUArray(delaymatrix);
} else { ...(call the other kernel)
Commenting out the destance formula index units kernel gives a compiler error on this block:
.\06_MEX_Functions\calculate_delays.cu(101): error: identifier "uint32_t" is undefined
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
^
This is wild, because uint32_t is completely defined in line 24 of stdint.h, and this is in the include tree of mex.h.
As far as I can tell, my code is completely valid, and intellisense in VSCode thinks so, too. ChatGPT doesn't find any code errors, either. What is going on here?

Accepted Answer

Joss Knight
Joss Knight on 24 Jun 2024
Edited: Joss Knight on 24 Jun 2024
You cannot call a function declared __global__ from another function declared __global__. Declare the second function as __device__ and that should work.
Your device functions and kernels need to be declared using supported CUDA device types, whereas you are using host-side types defined by the MEX headers. Try uint32_T instead (i.e. capital T).
  1 Comment
Nathan
Nathan on 24 Jun 2024
Excellent, great catch on the capital vs. lowercase T in uint32_T, that fixed the compiler issues completely! (Did not need to declare the second function as __device__)

Sign in to comment.

More Answers (0)

Categories

Find more on MATLAB Compiler in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!