Main Content

gpucoder.atomicAdd

Atomically add value and variable in global or shared memory

Since R2021b

    Description

    The gpucoder.atomicAdd function reads a value from a global or shared GPU memory location, adds to the value, and writes the result back to the memory location. In generated GPU code, the operation is atomic, which means that a GPU thread performs the read-modify-write operation without interference from other threads.

    A = gpucoder.atomicAdd(A,B) atomically adds the value in the global or shared GPU memory location, A, to the value of B and writes the results back into A.

    Call the gpucoder.atomicAdd function directly inside a for-loop that you want to execute on the GPU in the generated CUDA® code. Each iteration of the loop must be able to read and write to A.

    example

    [A,oldA] = gpucoder.atomicAdd(A,B) returns the previous value of A as oldA. When you use this syntax, use the coder.gpu.kernel pragma before the loop that contains gpucoder.atomicAdd.

    Examples

    collapse all

    Create a histogram of a vector by using the gpucoder.atomicAdd function to parallelize the computations in generated CUDA code.

    Write an entry-point function named numberHistogram that accepts the matrix input X and outputs the number of times each integer from 0 through 31 appears in X. Use a for-loop and addition to count the number of times each integer appears.

    function counts = numberHistogram(X)
    counts = zeros(1, 32);
    for i=1:numel(X)
        curr = X(i);
        if 0 <= curr && curr < 32
            counts(curr+1) = counts(curr+1)+1;
        end
    end

    The function calculates the histogram in MATLAB®. However, mapping this loop to a kernel would generate a kernel with a data race. Because multiple iterations of the loop can increment the same element of counts, multiple threads can read and write to an element of counts at the same time. To perform the increment operation without interference from other threads, use gpucoder.atomicAdd to make the addition operation atomic.

    function counts = numberHistogram(X)
    counts = zeros(1, 32);
    for i=1:numel(X)
        curr = X(i);
        if 0 <= curr && curr < 32
            counts(curr+1) = gpucoder.atomicAdd(counts(curr+1),1);
        end
    end

    Create an input variable named X that contains random integers from 0 to 31.

    X = randi([0 31],[1 2^10]);
    

    To generate a CUDA MEX function, use the codegen command.

    cfg = coder.gpuConfig("mex");
    codegen -config cfg -args {X} numberHistogram

    The generated CUDA code contains a kernel named numberHistogram_kernel2 that calculates the histogram. The kernel uses atomic addition to prevent threads from reading from or writing to an element of counts while another thread performs the addition operation.

    static __global__
        __launch_bounds__(128, 1) void numberHistogram_kernel2(const real_T X[1024],
                                                               real_T counts[32])
    {
      int32_T i;
      i = static_cast<int32_T>(mwGetGlobalThreadIndex());
      if (i < 1024) {
        real_T b;
        b = X[i];
        if ((b >= 0.0) && (b < 32.0)) {
          gpu_atomicAdd(&counts[static_cast<int32_T>(b + 1.0) - 1], 1.0);
        }
      }
    }

    Input Arguments

    collapse all

    Reference to a shared or global GPU memory location, specified as a scalar. Use the reference as an input and output argument of gpucoder.atomicAdd.

    Data Types: double | single | int32 | uint32 | uint64

    Operand, specified as a scalar.

    Data Types: double | single | int32 | uint32 | uint64

    Extended Capabilities

    expand all

    C/C++ Code Generation
    Generate C and C++ code using MATLAB® Coder™.

    GPU Code Generation
    Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

    Version History

    Introduced in R2021b