bat365 Homepage

GPU Coder

Generate CUDA code for NVIDIA GPUs

GPU Coder™ generates optimized CUDA^® code from MATLAB^® code and Simulink^® models. The generated code includes CUDA kernels for parallelizable parts of your deep learning, embedded vision, and signal processing algorithms. For high performance, the generated code calls optimized NVIDIA^® CUDA libraries, including TensorRT™, cuDNN, cuFFT, cuSolver, and cuBLAS. The code can be integrated into your project as source code, static libraries, or dynamic libraries, and it can be compiled for desktops, servers, and GPUs embedded on NVIDIA Jetson™, NVIDIA DRIVE™, and other platforms. You can use the generated CUDA within MATLAB to accelerate deep learning networks and other computationally intensive portions of your algorithm. GPU Coder lets you incorporate handwritten CUDA code into your algorithms and into the generated code.

When used with Embedded Coder^®, GPU Coder lets you verify the numerical behavior of the generated code via software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

Get Started:

Generate Fast, Flexible CUDA Code
Generate CUDA Code from Simulink Models
Generate CUDA Code from Deep Learning Networks
Optimize the Generated Code
Prototype on Hardware
Accelerate Algorithms

What Is GPU Coder?

Generate Fast, Flexible CUDA Code

Generate optimized CUDA code. Deploy code royalty-free.

Deploy Algorithms Royalty-Free

Compile and run your generated code on popular NVIDIA GPUs, from desktop systems to data centers to embedded hardware. The generated code is royalty-free—deploy it in commercial applications to your customers at no charge.

Generate CUDA Code for a Fog Rectification Algorithm (2:22)

GPU Code Generation: The Mandelbrot Set

Explore gallery (2 images)

GPU Coder Success Stories

Learn how engineers and scientists in a variety of industries use GPU Coder to generate CUDA code for their applications.

Drass deploys maritime optical tracking & obstacle awareness system using YOLO v2 network into Visual Studio application running on NVIDIA GPUs

Airbus Prototypes Aircraft Inspection Demonstrator Running on NVIDIA Jetson TX2 to Automate Detection of Defects

Airbus prototypes automated detection of defects on NVIDIA Jetson TX2.

Generate Code from Supported Toolboxes and Functions

GPU Coder generates code from a broad range of MATLAB language features that design engineers use to develop algorithms as components of larger systems. This includes hundreds of operators and functions from MATLAB and companion toolboxes.

Supported Toolboxes and Functions

MATLAB Language Features Support

MATLAB language and toolbox support for code generation.

Incorporate Legacy Code

Use legacy code integration capabilities to incorporate trusted or highly optimized CUDA code into your MATLAB algorithms for testing in MATLAB. Then call the same CUDA code from the generated code as well.

Legacy Code Integration

Incorporating existing CUDA code into generated code.

Generate CUDA Code from Simulink Models

Create models in Simulink and generate optimized CUDA code.

Run Simulations and Generate Optimized Code for NVIDIA GPUs

When used with Simulink Coder™, GPU Coder accelerates compute-intensive portions of MATLAB Function blocks in your Simulink models on NVIDIA GPUs. You can then generate optimized CUDA code from the Simulink model and deploy it to your NVIDIA GPU target.

Simulation Acceleration by Using GPU Coder

Code Generation from Simulink Models by Using GPU Coder

Targeting NVIDIA Embedded Boards

Simulink model of a Sobel edge detector running on a GPU.

Deploy End-to-End Deep Learning Algorithms

Use a variety of trained deep learning networks (including ResNet-50, SegNet, and LSTM) from Deep Learning Toolbox™ in your Simulink model and deploy to NVIDIA GPUs. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

Supported Networks and Layers

Deep Learning in Simulink Using MATLAB Function Block

Deep Learning in Simulink for NVIDIA GPUs: Generate CUDA Code Using GPU Coder

Log Signals, Tune Parameters, and Numerically Verify Code Behavior

When used with Simulink Coder, GPU Coder enables you to log signals and tune parameters in real time using external mode simulations. Use Embedded Coder with GPU Coder to run software-in-the-loop and processor-in-the-loop tests that numerically verify the generated code matches the behavior of the simulation.

Parameter Tuning and Signal Monitoring Using External Mode

Numerical Equivalence Testing

Deep Learning in Simulink for NVIDIA GPUs: Classification of ECG Signals

Generate CUDA Code from Deep Learning Networks

Deploy trained deep learning networks with Deep Learning Toolbox.

Deploy End-to-End Deep Learning Algorithms

Deploy a variety of trained deep learning networks (including ResNet-50, SegNet, and LSTM) from Deep Learning Toolbox to NVIDIA GPUs. Use predefined deep learning layers or define custom layers for your specific application. Generate code for preprocessing and postprocessing along with your trained deep learning networks to deploy complete algorithms.

Supported Networks and Layers

Real-Time Object Detection with YOLO v2 Using GPU Coder (4:24)

Code Generation for Object Detection Using YOLO v3 Deep Learning

Generation for Semantic Segmentation Network by Using U-net

How to Generate CUDA Code for a Keras-TensorFlow Model

Generate Optimized Code for Inference

GPU Coder generates code with a smaller footprint compared with other deep learning solutions because it only generates the code needed to run inference with your specific algorithm. The generated code calls optimized libraries, including TensorRT and cuDNN.

Lane Detection Optimized with GPU Coder

Single image inference with VGG-16 on a Titan V GPU using cuDNN.

Optimize Further Using TensorRT

Generate code that integrates with NVIDIA TensorRT, a high-performance deep learning inference optimizer and runtime. Use INT8 or FP16 data types for an additional performance boost over the standard FP32 data type.

Pedestrian Detection on NVIDIA GPUs with TensorRT (1:34)

Deep Learning Prediction with NVIDIA TensorRT

Deep Learning on Jetson AGX Xavier Using MATLAB, GPU Coder, and TensorRT (24:40)

Using MATLAB and TensorRT on NVIDIA GPUs

Improving execution speed with TensorRT and INT8 data types.

Deep Learning Quantization

Quantize your deep learning network to reduce memory usage and increase inference performance. Analyze and visualize the tradeoff between increased performance and inference accuracy using the Deep Network Quantizer app.

INT8 Quantization with Deep Network Quantizer

Quantization of Deep Neural Networks

What Is int8 Quantization and Why Is It Popular for Deep Neural Networks?

Deep Network Quantization and Deployment Using Deep Learning Toolbox Model Quantization Library

Optimize the Generated Code

GPU Coder automatically optimizes the generated code. Use design patterns to further increase performance.

Minimize CPU-GPU Memory Transfers and Optimize Memory Usage

GPU Coder automatically analyzes, identifies, and partitions segments of MATLAB code to run on either the CPU or GPU. It also minimizes the number of data copies between CPU and GPU. Use profiling tools to identify other potential bottlenecks.

GPU Programming Paradigm

Kernel Creation

GPU Memory Allocation and Minimization

GPU Execution Profiling of the Generated Code

Profile reports identifying potential bottlenecks.

Invoke Optimized Libraries

Code generated with GPU Coder calls optimized NVIDIA CUDA libraries, including TensorRT, cuDNN, cuSolver, cuFFT, cuBLAS, and Thrust. Code generated from MATLAB toolbox functions are mapped to optimized libraries whenever possible.

Kernels from Library Calls

NVIDIA TensorRT

NVIDIA cuDNN

NVIDIA cuFFT

Generated code calling functions in the optimized cuFFT CUDA library.

Use Design Patterns for Further Acceleration

Design patterns such as stencil processing use shared memory to improve memory bandwidth. They are applied automatically when using certain functions such as convolution. You can also manually invoke them using specific pragmas.

Design Patterns

Stencil Processing on GPU

The stencil processing design pattern.

Prototype on Hardware

Get to hardware fast with automatic conversion of your algorithm to CUDA code.

Prototype on NVIDIA Jetson and DRIVE Platforms

Automate cross-compilation and deployment of generated code onto NVIDIA Jetson and DRIVE platforms using GPU Coder Support Package for NVIDIA GPUs.

NVIDIA Tegra Support from GPU Coder

NVIDIA DRIVE Support from GPU Coder

Using GPU Coder to Prototype and Deploy on NVIDIA Drive, Jetson (2:54)

Semantic Segmentation on NVIDIA DRIVE

Prototyping on the NVIDIA Jetson platform.

Access Peripherals and Sensors from MATLAB and Generated Code

Remotely communicate with the NVIDIA target from MATLAB to acquire data from webcams and other supported peripherals for early prototyping. Deploy your algorithm along with peripheral interface code to the board for standalone execution.

Sobel Edge Detection Using Webcam on NVIDIA Jetson

Deployment and Classification of Webcam Images on NVIDIA Jetson TX2 Platform

Access peripherals and sensors from MATLAB and generated code.

Move from Prototyping to Production

Use GPU Coder with Embedded Coder to interactively trace your MATLAB code side-by-side with the generated CUDA code. Verify the numerical behavior of the generated code running on the hardware using software-in-the-loop (SIL) and processor-in-the-loop (PIL) testing.

Trace Between MATLAB Code and Generated CUDA Code

Verify Correctness of the Generated Code

Processor-in-the-Loop Execution with the GPU Coder App

Execution Time Profiling for PIL

Interactive traceability report using GPU Coder with Embedded Coder.

Accelerate Algorithms

Generate CUDA code and compile it for use inside MATLAB and Simulink.

Accelerate Algorithms Using GPUs in MATLAB

Call generated CUDA code as a MEX function from your MATLAB code to speed execution, though performance will vary depending on the nature of your MATLAB code. Profile generated MEX functions to identify bottlenecks and focus your optimization efforts.

GPU Code Generation: The Mandelbrot Set

GPU Execution Profiling of the Generated Code

Accelerate Radar Simulations on NVIDIA GPUs Using GPU Coder (3:24)