gpu output array wrong dimensions
2 views (last 30 days)
Show older comments
Running a kernel by using feval the output size of the array is of wrong dimension, more specifically the declaration in matlab is :
hits_gpu = gpuArray.zeros(numRays * numTriangles, 1, 'single'); % Hits array as int32
the kernel is :
__global__ void moller_trumbore(
float* rayOrig_x,
float* rayOrig_y,
float* rayOrig_z,
float* rayDir_x,
float* rayDir_y,
float* rayDir_z,
float* vert0_x,
float* vert0_y,
float* vert0_z,
float* vert1_x,
float* vert1_y,
float* vert1_z,
float* vert2_x,
float* vert2_y,
float* vert2_z,
float* hits,
float* t_vals,
float* u_vals,
float* v_vals,
int numRays,
int numTriangles)
{
// Calculate ray index and triangle index from thread index
int rayIdx = blockIdx.x * blockDim.x + threadIdx.x; // Ray index
int triangleIdx = blockIdx.y * blockDim.y + threadIdx.y; // Triangle index
// Load ray origin and direction components
float orig_x = rayOrig_x[rayIdx];
float orig_y = rayOrig_y[rayIdx];
float orig_z = rayOrig_z[rayIdx];
float dir_x = rayDir_x[rayIdx];
float dir_y = rayDir_y[rayIdx];
float dir_z = rayDir_z[rayIdx];
// Load triangle vertices from separate coordinate arrays
float v0_x = vert0_x[triangleIdx];
float v0_y = vert0_y[triangleIdx];
float v0_z = vert0_z[triangleIdx];
float v1_x = vert1_x[triangleIdx];
float v1_y = vert1_y[triangleIdx];
float v1_z = vert1_z[triangleIdx];
float v2_x = vert2_x[triangleIdx];
float v2_y = vert2_y[triangleIdx];
float v2_z = vert2_z[triangleIdx];
// Calculate edges for triangle
float edge1_x = v1_x - v0_x;
float edge1_y = v1_y - v0_y;
float edge1_z = v1_z - v0_z;
float edge2_x = v2_x - v0_x;
float edge2_y = v2_y - v0_y;
float edge2_z = v2_z - v0_z;
// Calculate determinant using cross product and dot product
float h_x = dir_y * edge2_z - dir_z * edge2_y;
float h_y = dir_z * edge2_x - dir_x * edge2_z;
float h_z = dir_x * edge2_y - dir_y * edge2_x;
float a = edge1_x * h_x + edge1_y * h_y + edge1_z * h_z;
float f = 1.0f / a;
float s_x = orig_x - v0_x;
float s_y = orig_y - v0_y;
float s_z = orig_z - v0_z;
float u = f * (s_x * h_x + s_y * h_y + s_z * h_z);
// Calculate q vector and v
float q_x = s_y * edge1_z - s_z * edge1_y;
float q_y = s_z * edge1_x - s_x * edge1_z;
float q_z = s_x * edge1_y - s_y * edge1_x;
float v = f * (dir_x * q_x + dir_y * q_y + dir_z * q_z);
// Calculate t to find the intersection point
float t = f * (edge2_x * q_x + edge2_y * q_y + edge2_z * q_z);
//hits[rayIdx * numTriangles + triangleIdx] = 1; // Mark as hit
hits[rayIdx * numTriangles + triangleIdx] = 1.0f; // Mark as hit
}
the parameters of the kernel call are:
numRays = 512; % Example number of rays
numTriangles = 512; % Example number of triangles
blockSize = [16,16]; % Block size (adjust based on your hardware)
gridSize = [ceil(numRays / blockSize(1)), ceil(numTriangles / blockSize(2))];
k.ThreadBlockSize = blockSize;
k.GridSize = gridSize;
the size of the array at the output of the kernel lauch is of dimension numRays instead of numRays * numTriangles, why?
By launching matlab from terminal and checking with printf the kernel effectively the variable hits is computed the rigth times, it seems that when it returns to matlab there is some problem.
0 Comments
Accepted Answer
Joss Knight
on 19 Sep 2024
The outputs of your kernel will be the non-const pointer inputs to your kernel in the order they appear in your function signature. The launch parameters do not change that.
It looks like you are passing your desired output as the 16th non-const input, so perhaps you are simply not retrieving that output? If that is the only kernel output, make all the other kernel inputs const pointers and then hits will be the only kernel output, or just reorder the arguments so that it's the first input, and therefore the first output.
More Answers (0)
See Also
Categories
Find more on Get Started with GPU Coder in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!