Main Content

Maximum blocks per kernel

Maximum number of thread blocks per GPU kernel

Since R2021a

Model Configuration Pane: Code Generation / GPU Code

Description

The Maximum blocks per kernel parameter specifies the maximum number of CUDA® blocks created during a kernel launch.

Because GPU devices have limited streaming multiprocessor (SM) resources, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading, and unloading of blocks.

If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA kernels with striding.

When you specify the maximum number of blocks for each kernel, the code generator creates 1-D kernels. To force the code generator to create 2-D or 3-D kernels, use the coder.gpu.kernel (GPU Coder) pragma. The coder.gpu.kernel pragma takes precedence over the maximum number of kernels for each CUDA block.

Dependencies

  • This parameter requires a GPU Coder™ license.

  • To enable this parameter, select Generate GPU code on the Code Generation pane.

Settings

0 (default) | integer

Specify the maximum number of CUDA blocks created during a kernel launch.

Recommended Settings

ApplicationSetting
DebuggingNo impact
TraceabilityNo impact
EfficiencyNo impact
Safety precautionNo impact

Programmatic Use

Parameter: GPUMaximumBlocksPerKernel
Type: integer
Value: any valid value
Default: 0

Version History

Introduced in R2021a