Main Content

Accelerate Simulation Using GPUs

A GPU-based System object™ looks and behaves much like the non-GPU-based System objects in the Communications Toolbox™ product. The important difference is that the algorithm is executed on a graphics processing unit (GPU) rather than on a CPU. Using the GPU can accelerate your simulation.

GPUs excel at processing large quantities of data and performing computations with high compute intensity. Processing large quantities of data is one way to maximize the throughput of your GPU in a simulation. The amount of data that the GPU processes at any one time depends on the size of the data passed to the input of a GPU-based System object. Therefore, one way to maximize this data size is by processing multiple frames of data.

You can use a single GPU-based System object to process multiple data frames simultaneously or in parallel. This differs implementation from the way standard System objects are implemented. For GPU-based System objects, the number of frames the objects process in a single call to the object function is either implied by one of the object properties or explicitly stated using the NumFrames property on the objects.

Passing MATLAB® arrays to a GPU-based System object requires transferring the initial data from a CPU to the GPU. Then, the GPU-based System object performs calculations and transfers the output data back to the CPU. This process introduces latency. When you pass data in the form of a gpuArray (Parallel Computing Toolbox) to a GPU-based System object, the object does not incur the latency from data transfer. Therefore, a GPU-based System object runs faster when you supply a gpuArray as the input.

In general, you should try to minimize the amount of data transfer between the CPU and the GPU in your simulation. For more information, see Establish Arrays on a GPU (Parallel Computing Toolbox).

GPU-Based System Object Construction

System objects for the Communications Toolbox product are located in the comm namespace and are constructed as:

H = comm.<object name>

For example, you construct a Viterbi decoder System object as:

H = comm.ViterbiDecoder

In cases where a corresponding GPU-based implementation of a System object exists, they are located in the comm.gpu namespace and constructed as:

H = comm.gpu.<object name>

For example, you construct a GPU-based Viterbi decoder System object as:

H = comm.gpu.ViterbiDecoder

For a list of available GPU-based implementations, see GPU Arrays Support List for System Objects.

Process Multiple Data Frames Using GPU-Based System Objects

You can use a single GPU System object™ to process multiple data frames simultaneously. Some GPU-based System objects, such as the LDPC decoder, can infer the number of frames from the object properties. Other GPU-based System objects, such as the Viterbi decoder, include a NumFrames property to define the number of frames present in the input data.

First, simultaneously process two data frames using a GPU-based LDPC decoder System object™. The ParityCheckMatrix property determines the frame size. The frame size and the input data vector length determine the number of frames processed by the LDPC decoder object.

numframes = 2;

ldpcEnc = comm.LDPCEncoder;
ldpcGPUDec = comm.gpu.LDPCDecoder;
ldpcDec = comm.LDPCDecoder;

msg = randi([0 1],32400,2);

for ii=1:numframes
    encout(:,ii) = ldpcEnc(msg(:,ii));

% Single ended to bipolar (for LLRs)
encout = 1-2*encout;

% Decode on the CPU
for ii=1:numframes
    cout(:,ii) = ldpcDec(encout(:,ii));

% Multiframe decode on the GPU
gout = ldpcGPUDec(encout(:));

% Check equality
ans = logical

Next, process multiple data frames using the NumFrames property of the GPU-based Viterbi decoder System object. For a Viterbi decoder, the frame size of your system cannot be inferred from an object property. Instead, you must define the number of frames present in the input data by using the NumFrames property of the Viterbi decoder object.

numframes = 10;
convEncoder = comm.ConvolutionalEncoder( ...
vitDecoder = comm.ViterbiDecoder( ...

Create a GPU-based Viterbi decoder System object using the NumFrames property.

vitGPUDecoder = comm.gpu.ViterbiDecoder( ...
    TerminationMethod="Terminated", ...
msg = randi([0 1],200,numframes);
for ii=1:numframes
    convEncOut(:,ii) = 1-2*convEncoder(msg(:,ii));
% Decode on the CPU
for ii=1:numframes
    cVitOut(:,ii) = vitDecoder(convEncOut(:,ii));
% Decode on the GPU
gVitOut = vitGPUDecoder(convEncOut(:));

% Check equality
ans = logical

Pass Data to GPU-Based System Objects Using gpuarray Input

In this example, you transmit 1/2 rate convolutionally encoded 16-PSK-modulated data through an AWGN channel, demodulate and decode the received data, and assess the error rate of the received data. For this implementation, you use the GPU-based Viterbi decoder System object™ to process multiple signal frames in a single call and then use gpuArray (Parallel Computing Toolbox) objects to pass data into and out of the GPU-based System objects.

Create GPU-based System objects for PSK modulation and demodulation, convolutional encoding, Viterbi decoding, and AWGN. Create a System object for error rate calculation.

M = 16; % Modulation order
numframes = 100;

gpuconvenc = comm.gpu.ConvolutionalEncoder;
gpupskmod = comm.gpu.PSKModulator(M,pi/16,BitInput=true);
gpupskdemod = comm.gpu.PSKDemodulator(M,pi/16,BitOutput=true);
gpuawgn = comm.gpu.AWGNChannel( ...
    NoiseMethod='Signal to noise ratio (SNR)',SNR=30);
gpuvitdec = comm.gpu.ViterbiDecoder( ...
    InputFormat='Hard', ...
    TerminationMethod='Truncated', ...
errorrate = comm.ErrorRate(ComputationDelay=0,ReceiveDelay=0);

Due to the computational complexity of the Viterbi decoding algorithm, loading multiple frames of signal data on the GPU and processing them in one call can reduce overall simulation time. To enable this implementation, the GPU-based Viterbi decoder System object contains a NumFrames property. Instead of using an external for-loop to process individual frames of data, you use the NumFrames property to configure the GPU-based Viterbi decoder System object to process multiple data frames. Generate numframes of binary data frames. To efficiently manage the data frames for processing by the GPU-based System objects, represent the transmission data frames as a gpuArray object.

numsymbols = 50;
rate = 1/2; 
dataA = gpuArray.randi([0 1],rate*numsymbols*log2(M),numframes);

The error rate object does not support gpuArray objects or multichannel data, so you must retrieve the array from the GPU by using the gather (Parallel Computing Toolbox) function to compute the error rate on each frame of data in a for-loop. Perform the GPU-based encoding, modulation, AWGN, and demodulation inside a for-loop.

for ii = 1:numframes
    encodedData = gpuconvenc(dataA(:,ii));
    modsig = gpupskmod(encodedData);
    noisysig = gpuawgn(modsig);
    demodsig(:,ii) = gpupskdemod(noisysig);

The GPU-based Viterbi decoder performs multiframe processing without a for-loop.

rxbits = gpuvitdec(demodsig(:));

errorStats = errorrate(gather(dataA(:)),gather(rxbits));
fprintf('BER = %f\nNumber of errors = %d\nTotal bits = %d', ...
    errorStats(1), errorStats(2), errorStats(3))
BER = 0.009800
Number of errors = 98
Total bits = 10000

MATLAB System Block Support for GPU-Based System Objects

If you are using MATLAB System (Simulink) blocks in your implementation, you can include these GPU-based System objects in them.

  • comm.gpu.AWGNChannel

  • comm.gpu.BlockDeinterleaver

  • comm.gpu.BlockInterleaver

  • comm.gpu.ConvolutionalDeinterleaver

  • comm.gpu.ConvolutionalEncoder

  • comm.gpu.ConvolutionalInterleaver

  • comm.gpu.PSKDemodulator

  • comm.gpu.PSKModulator

  • comm.gpu.TurboDecoder

  • comm.gpu.ViterbiDecoder

The GPU System objects must be simulated using Interpreted execution. You must select this option explicitly on the block mask; the default value is Code generation.

MATLAB System block mask with the Interpreted execution option selected

Related Topics