gpucoder.reduce
Optimized GPU implementation for reduction operations
Syntax
Description
aggregates the values in the input array to N values by using the
N function handles in the cell array. The output S = gpucoder.reduce(A,{FUN_1,FUN_2,...,FUN_N})S
is a 1-by-N array, where N is the number of function
handles.
The generated GPU code uses CUDA®
shfl intrinsics to perform reduction operations on the GPU. The function
performs the reduction for each function handle inside a single kernel on the GPU.
aggregates the values in the input array using the options specified by one or more
name-value arguments.S = gpucoder.reduce(___,Name=Value)
Examples
Input Arguments
Name-Value Arguments
Output Arguments
Limitations
gpucoder.reducedoes not support reducing complex arrays.For code generation,
gpucoder.reduceaccepts a limited number of user-defined function handles based on the size of the output data type. For example, you can input up to 46 function handles that output thehalfdata type or up to 11 function handles that output thedoubledata type. If you input too many function handles, code generation generates an error.For inputs of an integer data type, the generated code may contain intermediate computations that reach saturation. In this case, the results from the generated code may not match the simulation results from MATLAB®.