Main Content

gpucoder.atomicMin

Atomically find the minimum between value and variable in global or shared memory

Since R2021b

    Description

    The gpucoder.atomicMin function reads a value from a global or shared GPU memory location, compares it to an operand, and writes the minimum value back to the memory location. In generated GPU code, the operation is atomic, which means that a GPU thread performs the read-modify-write operation without interference from other threads.

    A = gpucoder.atomicMin(A,B) compares B to the value of A and writes the value of min(A,B) back into A.

    Call the gpucoder.atomicMin function directly inside a for-loop that you want to execute on the GPU in the generated CUDA® code. Each iteration of the loop must be able to read and write to A.

    example

    [A,oldA] = gpucoder.atomicMin(A,B) returns the previous value of A as oldA. When you use this syntax, use the coder.gpu.kernel pragma before the loop that contains gpucoder.atomicMin.

    Examples

    collapse all

    Perform a simple atomic addition operation by using the gpucoder.atomicMin function and generate CUDA code that calls corresponding CUDA atomicMin() APIs.

    In one file, write an entry-point function myAtomicMin that accepts matrix inputs a and b.

    function a = myAtomicMin(a,b)
    coder.gpu.kernelfun;
    
    for i =1:numel(a)
        a(i) = gpucoder.atomicMin(a(i), b);
    end
    
    end
    

    To create a type for a uint32 matrix for use in code generation, use the coder.newtype function.

    A = coder.newtype('uint32', [1 30], [0 1]);
    B = coder.newtype('uint32', [1 1], [0 0]);
    inputArgs = {A,B};
    

    To generate a CUDA library, use the codegen function.

    cfg = coder.gpuConfig('lib');
    cfg.GenerateReport = true;
    
    codegen -config cfg -args inputArgs myAtomicMin -d myAtomicMin
    

    The generated CUDA code contains the myAtomicMin_kernel1 kernel with calls to the atomicMin() CUDA APIs.

    //
    // File: myAtomicMin.cu
    //
    ...
    
    static __global__ __launch_bounds__(1024, 1) void myAtomicMin_kernel1(
        const uint32_T b, const int32_T i, uint32_T a_data[])
    {
      uint64_T loopEnd;
      uint64_T threadId;
    ...
    
      for (uint64_T idx{threadId}; idx <= loopEnd; idx += threadStride) {
        int32_T b_i;
        b_i = static_cast<int32_T>(idx);
        atomicMin(&a_data[b_i], b);
      }
    }
    ...
    
    void myAtomicMin(uint32_T a_data[], int32_T a_size[2], uint32_T b)
    {
      dim3 block;
      dim3 grid;
    ...
    
        cudaMemcpy(gpu_a_data, a_data, a_size[1] * sizeof(uint32_T),
                   cudaMemcpyHostToDevice);
        myAtomicMin_kernel1<<<grid, block>>>(b, i, gpu_a_data);
        cudaMemcpy(a_data, gpu_a_data, a_size[1] * sizeof(uint32_T),
                   cudaMemcpyDeviceToHost);
    ...
    
    }
    

    Input Arguments

    collapse all

    Reference to a shared or global GPU memory location, specified as a scalar. Use the reference as an input and output argument of gpucoder.atomicMin.

    Data Types: int32 | uint32 | uint64

    Operand, specified as a scalar.

    Data Types: int32 | uint32 | uint64

    Extended Capabilities

    expand all

    C/C++ Code Generation
    Generate C and C++ code using MATLAB® Coder™.

    GPU Code Generation
    Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

    Version History

    Introduced in R2021b