Maximum blocks per kernel

Maximum number of thread blocks per GPU kernel

Since R2021a

Model Configuration Pane: Code Generation / GPU Code

Description

The Maximum blocks per kernel parameter specifies the maximum number of CUDA^® blocks created during a kernel launch.

Because GPU devices have limited streaming multiprocessor (SM) resources, limiting the number of blocks for each kernel can avoid performance losses from scheduling, loading, and unloading of blocks.

If the number of iterations in a loop is greater than the maximum number of blocks per kernel, the code generator creates CUDA kernels with striding.

When you specify the maximum number of blocks for each kernel, the code generator creates 1-D kernels. To force the code generator to create 2-D or 3-D kernels, use the coder.gpu.kernel (GPU Coder) pragma. The coder.gpu.kernel pragma takes precedence over the maximum number of kernels for each CUDA block.

Dependencies

This parameter requires a GPU Coder™ license.
To enable this parameter, select Generate GPU code on the Code Generation pane.

Settings

0 (default) | integer

Specify the maximum number of CUDA blocks created during a kernel launch.

Recommended Settings

Application	Setting
Debugging	No impact
Traceability	No impact
Efficiency	No impact
Safety precaution	No impact

Programmatic Use

Parameter: GPUMaximumBlocksPerKernel

Type: integer

Value: any valid value

Default: 0

Version History

Introduced in R2021a