Supported Networks, Layers, and Classes
Supported Pretrained Networks
GPU Coder™ supports code generation for series and directed acyclic graph (DAG) convolutional neural networks (CNNs or ConvNets). You can generate code for a trained convolutional neural network whose layers are supported for code generation. See Supported Layers. You can train a convolutional neural network on either a CPU, a GPU, or multiple GPUs by using the Deep Learning Toolbox™ or use one of the pretrained networks listed in the table and generate CUDA® code.
Network Name | Description | cuDNN | TensorRT | ARM® Compute Library for Mali GPU |
---|---|---|---|---|
AlexNet | AlexNet convolutional neural network. For the pretrained AlexNet model, see The syntax | Yes | Yes | Yes |
Caffe Network | Convolutional neural network models from Caffe. For importing a pretrained network from Caffe, see | Yes | Yes | Yes |
Darknet-19 | Darknet-19 convolutional neural network. For more information, see The syntax | Yes | Yes | Yes |
Darknet-53 | Darknet-53 convolutional neural network. for more information, see The syntax | Yes | Yes | Yes |
DeepLab v3+ | DeepLab v3+ convolutional neural network. For more information, see | Yes | Yes | No |
DenseNet-201 | DenseNet-201 convolutional neural network. For the pretrained DenseNet-201 model, see The syntax | Yes | Yes | Yes |
EfficientNet-b0 | EfficientNet-b0 convolutional neural network. For the
pretrained EfficientNet-b0 model, see The syntax
| Yes | Yes | Yes |
GoogLeNet | GoogLeNet convolutional neural network. For the pretrained GoogLeNet model, see The syntax | Yes | Yes | Yes |
Inception-ResNet-v2 | Inception-ResNet-v2 convolutional neural network. For the pretrained Inception-ResNet-v2 model, see | Yes | Yes | No |
Inception-v3 | Inception-v3 convolutional neural network. For the pretrained Inception-v3 model, see The syntax | Yes | Yes | Yes |
Mobilenet-v2 | MobileNet-v2 convolutional neural network. For the pretrained MobileNet-v2 model, see The syntax | Yes | Yes | Yes |
NASNet-Large | NASNet-Large convolutional neural network. For the pretrained NASNet-Large model, see | Yes | Yes | No |
NASNet-Mobile | NASNet-Mobile convolutional neural network. For the pretrained NASNet-Mobile model, see | Yes | Yes | No |
ResNet | ResNet-18, ResNet-50, and ResNet-101 convolutional neural networks. For the pretrained ResNet models, see The syntax | Yes | Yes | Yes |
SegNet | Multi-class pixelwise segmentation network. For more information, see | Yes | Yes | No |
SqueezeNet | Small deep neural network. For the pretrained SqueezeNet models, see The syntax | Yes | Yes | Yes |
VGG-16 | VGG-16 convolutional neural network. For the pretrained VGG-16 model, see The syntax | Yes | Yes | Yes |
VGG-19 | VGG-19 convolutional neural network. For the pretrained VGG-19 model, see The syntax | Yes | Yes | Yes |
Xception | Xception convolutional neural network. For the pretrained Xception model, see The syntax | Yes | Yes | Yes |
YOLO v2 | You only look once version 2 convolutional neural network based object detector. For more information, see | Yes | Yes | Yes |
Supported Layers
The following layers are supported for code generation by GPU Coder for the target deep learning libraries specified in the table.
Input Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| An image input layer inputs 2-D images to a network and applies data normalization. Code generation does not support
| Yes | Yes | Yes |
| A sequence input layer inputs sequence data to a network. The cuDNN library supports vector and 2-D image sequences. The TensorRT library support only vector input sequences. For vector sequence inputs, the number of features must be a constant during code generation. For image sequence inputs, the height, width, and the number of channels must be a constant during code generation. Code
generation does not support | Yes | Yes | No |
| A feature input layer inputs feature data to a network and applies data normalization. | Yes | Yes | Yes |
Convolution and Fully Connected Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| A 2-D convolutional layer applies sliding convolutional filters to the input. | Yes | Yes | Yes |
| A fully connected layer multiplies the input by a weight matrix and then adds a bias vector. | Yes | Yes | No |
| A 2-D grouped convolutional layer separates the input channels into groups and applies sliding convolutional filters. Use grouped convolutional layers for channel-wise separable (also known as depth-wise separable) convolution. Code generation for
the ARM Mali GPU is not supported for a 2-D grouped
convolution layer that has the | Yes | Yes | Yes |
| A transposed 2-D convolution layer upsamples feature maps. | Yes | Yes | Yes |
Sequence Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| A sequence input layer inputs sequence data to a network. The cuDNN library supports vector and 2-D image sequences. The TensorRT library support only vector input sequences. For vector sequence inputs, the number of features must be a constant during code generation. For image sequence inputs, the height, width, and the number of channels must be a constant during code generation. Code
generation does not support | Yes | Yes | No |
| A bidirectional LSTM (BiLSTM) layer learns bidirectional long-term dependencies between time steps of time series or sequence data. These dependencies can be useful when you want the network to learn from the complete time series at each time step. For code generation, the
For code generation, the
| Yes | Yes | No |
| A flatten layer collapses the spatial dimensions of the input into the channel dimension. | Yes | No | No |
| A GRU layer learns dependencies between time steps in time series and sequence data. Code generation supports only
the | Yes | Yes | No |
| An LSTM layer learns long-term dependencies between time steps in time series and sequence data. For code generation,
the For code generation, the
| Yes | Yes | No |
| A sequence folding layer converts a batch of image sequences to a batch of images. Use a sequence folding layer to perform convolution operations on time steps of image sequences independently. | Yes | No | No |
| A sequence unfolding layer restores the sequence structure of the input data after sequence folding. | Yes | No | No |
| A word embedding layer maps word indices to vectors. | Yes | Yes | No |
Activation Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| A clipped ReLU layer performs a threshold operation, where a input value less than zero is set to zero and a value above the clipping ceiling is set to that clipping ceiling. | Yes | Yes | Yes |
| An ELU activation layer performs the identity operation on positive inputs and an exponential nonlinearity on negative inputs. | Yes | Yes | No |
| A leaky ReLU layer performs a threshold operation, where a input value less than zero is multiplied by a fixed scalar. | Yes | Yes | Yes |
| A ReLU layer performs a threshold operation to each element of the input, where a value less than zero is set to zero. | Yes | Yes | Yes |
| A GELU layer weights the input by its probability under a Gaussian distribution. | Yes | Yes | Yes |
| A | Yes | Yes | No |
| A swish activation layer applies the swish function on the layer inputs. | Yes | Yes | No |
| A hyperbolic tangent (tanh) activation layer applies the tanh function on the layer inputs. | Yes | Yes | Yes |
Normalization, Dropout, and Cropping Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| A batch normalization layer normalizes each input channel across a mini-batch. | Yes | Yes | Yes |
| A channel-wise local response (cross-channel) normalization layer carries out channel-wise normalization. | Yes | Yes | Yes |
| A group normalization layer normalizes a mini-batch of data across grouped subsets of channels for each observation independently. | Yes | Yes | No |
| A layer normalization layer normalizes a mini-batch of data across all channels for each observation independently. | Yes | Yes | No |
| A 2-D crop layer applies 2-D cropping to the input. | Yes | Yes | Yes |
| A dropout layer randomly sets input elements to zero with a given probability. | Yes | Yes | Yes |
| Scaling layer for actor or critic network. For code
generation, values for the | Yes | Yes | Yes |
Pooling and Unpooling Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| An average pooling layer performs down-sampling by dividing the input into rectangular pooling regions and computing the average values of each region. You can generate code using the
NVIDIA® cuDNN and TensorRT libraries by using the
For Simulink® models that implement deep learning functionality using MATLAB Function block, simulation errors out if the network contains an average pooling layer with non-zero padding value. In such cases, use the blocks from the Deep Neural Networks library instead of a MATLAB Function to implement the deep learning functionality. | Yes | Yes | Yes |
| A global average pooling layer performs down-sampling by computing the mean of the height and width dimensions of the input. | Yes | Yes | Yes |
| A global max pooling layer performs down-sampling by computing the maximum of the height and width dimensions of the input. | Yes | Yes | Yes |
| A max pooling layer performs down-sampling by dividing the input into rectangular pooling regions, and computing the maximum of each region. If equal max values exists along the
off-diagonal in a kernel window, implementation differences for the
| Yes | Yes | Yes |
| A max unpooling layer unpools the output of a max pooling layer. If equal max values exists along the off-diagonal
in a kernel window, implementation differences for the
| Yes | Yes | No |
Combination Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| An addition layer adds inputs from multiple neural network layers element-wise. | Yes | Yes | Yes |
| A concatenation layer takes inputs and concatenates them along a specified dimension. | Yes | Yes | No |
| A depth concatenation layer takes inputs that have the same height and width and concatenates them along the third dimension (the channel dimension). | Yes | Yes | Yes |
Object Detection Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| An anchor box layer stores anchor boxes for a feature map used in object detection networks. | Yes | Yes | Yes |
| A 2-D depth to space layer permutes data from the depth dimension into blocks of 2-D spatial data. | Yes | Yes | Yes |
| A space to depth layer permutes the spatial blocks of the input into the depth dimension. Use this layer when you need to combine feature maps of different size without discarding any feature data. | Yes | Yes | Yes |
| An SSD merge layer merges the outputs of feature maps for subsequent regression and classification loss computation. | Yes | Yes | No |
| Create reorganization layer for YOLO v2 object detection network. | Yes | Yes | Yes |
| Create transform layer for YOLO v2 object detection network. | Yes | Yes | Yes |
| A box regression layer refines bounding box locations by using a smooth L1 loss function. Use this layer to create a Fast or Faster R-CNN object detection network. | Yes | Yes | Yes |
| A focal loss layer predicts object classes using focal loss. | Yes | Yes | Yes |
| A region proposal network (RPN) classification layer classifies image regions as either object or background by using a cross entropy loss function. Use this layer to create a Faster R-CNN object detection network. | Yes | Yes | Yes |
| Create output layer for YOLO v2 object detection network. | Yes | Yes | Yes |
Output Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| A classification layer computes the cross entropy loss for multi-class classification problems with mutually exclusive classes. | Yes | Yes | Yes |
| A Dice pixel classification layer provides a categorical label for each image pixel or voxel using generalized Dice loss. | Yes | Yes | Yes |
| A focal loss layer predicts object classes using focal loss. | Yes | Yes | Yes |
| A pixel classification layer provides a categorical label for each image pixel or voxel. | Yes | Yes | Yes |
| A box regression layer refines bounding box locations by using a smooth L1 loss function. Use this layer to create a Fast or Faster R-CNN object detection network. | Yes | Yes | Yes |
| A regression layer computes the half-mean-squared-error loss for regression problems. | Yes | Yes | Yes |
| A region proposal network (RPN) classification layer classifies image regions as either object or background by using a cross entropy loss function. Use this layer to create a Faster R-CNN object detection network. | Yes | Yes | Yes |
| A sigmoid layer applies a sigmoid function to the input. | Yes | Yes | Yes |
| A softmax layer applies a softmax function to the input. | Yes | Yes | Yes |
| Output layers including custom classification or regression
output layers created by using
For an example showing how to define a custom classification output layer and specify a loss function, see Define Custom Classification Output Layer (Deep Learning Toolbox). For an example showing how to define a custom regression output layer and specify a loss function, see Define Custom Regression Output Layer (Deep Learning Toolbox). | Yes | Yes | Yes |
Custom Keras Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| Clips the input between the upper and lower bounds. | Yes | Yes | No |
| Flatten activations into 1-D assuming C-style (row-major) order. | Yes | Yes | Yes |
| Global average pooling layer for spatial data. | Yes | Yes | Yes |
| Parametric rectified linear unit. | Yes | Yes | No |
| Sigmoid activation layer. | Yes | Yes | Yes |
| Hyperbolic tangent activation layer. | Yes | Yes | Yes |
| Flatten a sequence of input image into a sequence of vector, assuming C-style (or row-major) storage ordering of the input layer. | Yes | Yes | No |
| Zero padding layer for 2-D input. | Yes | Yes | Yes |
Custom ONNX Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| Clips the input between the upper and lower bounds. | Yes | Yes | No |
| Layer that performs element-wise scaling of the input followed by an addition. | Yes | Yes | Yes |
| Flattens a MATLAB 2D image batch in the way ONNX does, producing a 2D
output array with | Yes | Yes | No |
| Flattens the spatial dimensions of the input tensor to the channel dimensions. | Yes | Yes | Yes |
| Global average pooling layer for spatial data. | Yes | Yes | Yes |
| Layer that implements ONNX identity operator. | Yes | Yes | Yes |
| Parametric rectified linear unit. | Yes | Yes | No |
| Sigmoid activation layer. | Yes | Yes | Yes |
| Hyperbolic tangent activation layer. | Yes | Yes | Yes |
| Verify fixed batch size. | Yes | Yes | Yes |
Custom Layers
Layer Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
| Custom layers, with or without learnable parameters, that you define for your problem. To learn how to define custom deep learning layers, see Define Custom Deep Learning Layers (Deep Learning Toolbox) and Define Custom Deep Learning Layer for Code Generation (Deep Learning Toolbox). For an example on how to generate code for a network with custom layers, see Code Generation for Object Detection Using YOLO v3 Deep Learning Network. The outputs of the custom layer must be fixed-size arrays. Using cuDNN targets support both row-major and column-major code generation for custom layers. TensorRT targets support only column-major code generation. For code generation,
custom layers must contain the Code generation for a sequence network containing custom layer and LSTM or GRU layer is not supported. You can pass
For unsupported function Z = predict(layer, X) if coder.target('MATLAB') Z = doPredict(X); else if isdlarray(X) X1 = extractdata(X); Z1 = doPredict(X1); Z = dlarray(Z1); else Z = doPredict(X); end end end | Yes | Yes | No |
Supported Classes
The following classes are supported for code generation by GPU Coder for the target deep learning libraries specified in the table.
Name | Description | cuDNN | TensorRT | ARM Compute Library for Mali GPU |
---|---|---|---|---|
DAGNetwork (Deep Learning Toolbox) | Directed acyclic graph (DAG) network for deep learning
| Yes | Yes | Yes |
dlnetwork (Deep Learning Toolbox) | Deep learning network for custom training loops
| Yes | Yes | No |
| PointPillars network to detect objects in lidar point clouds
| Yes | Yes | No |
SeriesNetwork (Deep Learning Toolbox) | Series network for deep learning
| Yes | Yes | Yes |
ssdObjectDetector (Computer Vision Toolbox) | Detect objects using the SSD-based detector.
| Yes | Yes | No |
yolov2ObjectDetector (Computer Vision Toolbox) | Detect objects using YOLO v2 object detector
| Yes | Yes | Yes |
| Detect objects using YOLO v3 object detector
| Yes | Yes | No |
| Detect objects using YOLO v4 object detector
| Yes | Yes | No |
| Detect objects using YOLOX object detector
| Yes | Yes | No |
Usage Notes and Limitations
The code generator represents characters in an 8-bit ASCII codeset that the locale setting determines. Therefore, the use of non-ASCII characters in class names, layer names, layer description, or network names might result in errors. For more information, see Encoding of Characters in Code Generation.
See Also
Functions
Objects
coder.gpuConfig
|coder.CodeConfig
|coder.EmbeddedCodeConfig
|coder.gpuEnvConfig
|coder.CuDNNConfig
|coder.TensorRTConfig
Related Topics
- Pretrained Deep Neural Networks (Deep Learning Toolbox)
- Get Started with Transfer Learning (Deep Learning Toolbox)
- Create Simple Deep Learning Neural Network for Classification (Deep Learning Toolbox)
- Load Pretrained Networks for Code Generation
- Code Generation for Deep Learning Networks by Using cuDNN
- Code Generation for Deep Learning Networks by Using TensorRT
- Code Generation for Deep Learning Networks Targeting ARM Mali GPUs