GPU Coder

Generate CUDA code for NVIDIA GPUs

GPU Coder™ generates optimized CUDA® code from MATLAB® code for deep learning, embedded vision, and autonomous systems. The generated code calls optimized NVIDIA® CUDA libraries, including cuDNN, cuSolver, and cuBLAS. It can be integrated into your project as source code, static libraries, or dynamic libraries, and can be used for prototyping on GPUs such as the NVIDIA Tesla® and NVIDIA Tegra®. You can use the generated CUDA within MATLAB to accelerate computationally intensive portions of your MATLAB code. GPU Coder lets you incorporate legacy CUDA code into your MATLAB algorithms and the generated code.

When used with Embedded Coder®, GPU Coder lets you verify the numerical behavior of the generated code via software-in-the-loop (SIL) testing.

Get Started:

Generate Fast, Flexible CUDA Code

Generate optimized CUDA code. Deploy code royalty-free.

Deploy Algorithms Royalty-Free

Compile and run your generated code on popular NVIDIA GPUs, from desktop systems to data centers to embedded hardware. The generated code is royalty-free—deploy it in commercial applications to your customers at no charge.

Explore gallery (2 images)

GPU Coder Success Stories

Learn how engineers and scientists in a variety of industries use GPU Coder to generate CUDA code for their applications.

Airbus prototypes automated detection of defects on NVIDIA Jetson TX2.

Generate Code from Supported Toolboxes and Functions

GPU Coder generates code from a broad range of MATLAB language features that design engineers use to develop algorithms as components of larger systems. This includes over 390 operators and functions from MATLAB and companion toolboxes.

MATLAB language and toolbox support for code generation.

Incorporate Legacy Code

Use legacy code integration capabilities to incorporate trusted or highly optimized CUDA code into your MATLAB algorithms for testing in MATLAB, then call the same CUDA code from the generated code as well.

Incorporating existing CUDA code into generated code.

Generate CUDA Code from Deep Learning Networks

Deploy trained deep learning networks with Deep Learning Toolbox.

Deploy End-To-End Deep Learning Algorithms

Deploy a variety of trained deep learning networks such as ResNet-50 and SegNet from Deep Learning Toolbox™ to NVIDIA GPUs. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

Generate Optimized Code for Inference

GPU Coder generates code with a smaller footprint compared with other deep learning solutions because it only generates the code needed to run inference with your specific algorithm. The generated code calls optimized libraries, including TensorRT™ and cuDNN.

Single image inference with VGG-16 on a Titan V GPU using cuDNN.

Optimize Further Using TensorRT

Generate code that integrates with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime. Use INT8 or FP16 data types for an additional performance boost over the standard FP32 data type.

Improving execution speed with TensorRT and INT8 data types.

Deep Learning Quantization

Quantize your deep learning network to INT8 and analyze the tradeoff on the accuracy of quantizing the weights and biases of selected layers using the Model Quantization Library support package.

Optimize the Generated Code

Take advantage of optimizations that are automatically applied to code generated by GPU Coder. Use design patterns to further increase performance.

Minimize CPU-GPU Memory Transfers and Optimize Memory Usage

GPU Coder automatically analyzes, identifies, and partitions segments of MATLAB code to run on either the CPU or GPU. It also minimizes the number of data copies between CPU and GPU. Use profiling tools to identify other potential bottlenecks.

Profile reports identifying potential bottlenecks.

Invoke Optimized Libraries

Code generated with GPU Coder calls optimized NVIDIA CUDA libraries, including TensorRT, cuDNN, cuSolver, cuFFT, cuBLAS, and Thrust. Code generated from MATLAB toolbox functions are mapped to optimized libraries whenever possible.

Generated code calling functions in the optimized cuFFT CUDA library.

Use Design Patterns for Further Acceleration

Design patterns such as stencil processing use shared memory to improve memory bandwidth. They are applied automatically when using certain functions such as convolution. You can also manually invoke them using specific pragmas.

The stencil processing design pattern.

Prototype on Hardware

Get to hardware fast with automatic conversion of your algorithm to CUDA code.

Prototype on NVIDIA Jetson and DRIVE Platforms

Automate cross-compilation and deployment of generated code onto NVIDIA Jetson™ and DRIVE™ platforms using GPU Coder Support Package for NVIDIA GPUs.

Prototyping on the NVIDIA Jetson platform.

Access Peripherals and Sensors from MATLAB and Generated Code

Remotely communicate with the NVIDIA target from MATLAB to acquire data from webcams and other supported peripherals for early prototyping. Build and deploy your algorithm along with peripheral interface code to the board for standalone execution.

Access peripherals and sensors from MATLAB and generated code.

Move from Prototyping to Production

Use GPU Coder with Embedded Coder to interactively trace your MATLAB code side-by-side with the generated CUDA. Verify the numerical behavior of the generated code running on the hardware using software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

Interactive traceability report using GPU Coder with Embedded Coder.

Accelerate Algorithms

Generate CUDA code and compile it for use inside MATLAB.

Accelerate Algorithms Using GPUs

Call generated CUDA code as a MEX function from your MATLAB code to speed execution, though performance will vary depending on the nature of your MATLAB code. Profile generated MEX functions to identify bottlenecks and focus your optimization efforts.

Latest Features

cuBLAS Support

Generate CUDA code for strided and batched matrix multiply

Row-Major Array Layout

Simplify interfacing generated deep learning code with target libraries by storing arrays in row-major layout

Signal Processing Toolbox Code Generation

Generate code for FFT-based FIR filtering and Short-time Fourier transform using fftfilt, stft, and istft

NVIDIA Hardware Support

Access onboard camera modules and generate CUDA code for the VideoReader function

Single Shot Object Detection (SSD) Networks

Object detection on NVIDIA GPU by using a single shot multibox detector

Long Short-Term Memory (LSTM) Networks

Generate code for bidirectional and stateful LSTM

Multi-Output Networks

Generate code for networks with multiple outputs

Deep Learning Networks

Generate code for Darknet-19, Darknet-53, Inception-ResNet-v2, NASNet-Large, and NASNet-Mobile

See the release notes for details on any of these features and corresponding functions.