bat365 Homepage

Quantization, Projection, and Pruning

Compress a deep neural network by performing quantization, projection, or pruning

Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint and computational requirements of a deep neural network by:

Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA^® code from this pruned network.
Projecting layers by performing principal component analysis (PCA) on the layer activations using a data set representative of the training data and applying linear projections on the layer learnable parameters. Forward passes of a projected deep neural network are typically faster when you deploy the network to embedded hardware using library-free C/C++ code generation.
Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA, or HDL code from this quantized network.
For C/C++ and CUDA code generation, the software generates code for a convolutional deep neural network by quantizing the weights, biases, and activations of the convolution layers to 8-bit scaled integer data types. The quantization is performed by providing the calibration result file produced by the calibrate function to the codegen (MATLAB Coder) command.
Code generation does not support quantized deep neural networks produced by the quantize function.

Functions

expand all

Pruning

`taylorPrunableNetwork`	Network that can be pruned by using first-order Taylor approximation
`forward`	Compute deep learning network output for training
`predict`	Compute deep learning network output for inference
`updatePrunables`	Remove filters from prunable layers based on importance scores
`updateScore`	Compute and accumulate Taylor-based importance scores for pruning
`dlnetwork`	Deep learning network for custom training loops

Projection

`compressNetworkUsingProjection`	Compress neural network using projection
`neuronPCA`	Principal component analysis of neuron activations

Quantization

`dlquantizer`	Quantize a deep neural network to 8-bit scaled integer data types
`dlquantizationOptions`	Options for quantizing a trained deep neural network
`calibrate`	Simulate and collect ranges of a deep neural network
`quantize`	Quantize deep neural network
`validate`	Quantize and validate a deep neural network
`quantizationDetails`	Display quantization details for a neural network
`estimateNetworkMetrics`	Estimate network metrics for specific layers of a neural network
`equalizeLayers`	Equalize layer parameters of deep neural network

Apps

Deep Network Quantizer

Quantize a deep neural network to 8-bit scaled integer data types

Topics

Pruning

Parameter Pruning and Quantization of Image Classification Network
Use parameter pruning and quantization to reduce network size.
Prune Image Classification Network Using Taylor Scores
This example shows how to reduce the size of a deep neural network using Taylor pruning.
Prune Filters in a Detection Network Using Taylor Scores
This example shows how to reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.

Projection

Compress Neural Network Using Projection
This example shows how to compress a neural network using projection and principal component analysis.

Deep Learning Quantization

Quantization of Deep Neural Networks
Understand effects of quantization and how to visualize dynamic ranges of network convolution layers.
Quantization Workflow Prerequisites
Products required for the quantization of deep learning networks.
Prepare Data for Quantizing Networks
Supported datastores for quantization workflows.

Quantization for GPU Target

Generate INT8 Code for Deep Learning Networks (GPU Coder)
Quantize and generate code for a pretrained convolutional neural network.
Quantize Residual Network Trained for Image Classification and Generate CUDA Code
This example shows how to quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.
Quantize Layers in Object Detectors and Generate CUDA Code
This example shows how to generate CUDA® code for an SSD vehicle detector and a YOLO v2 vehicle detector that performs inference computations in 8-bit integers for the convolutional layers.
Quantize Semantic Segmentation Network and Generate CUDA Code
Quantize Convolutional Neural Network Trained for Semantic Segmentation and Generate CUDA Code

Quantization for FPGA Target

Quantize Network for FPGA Deployment (Deep Learning HDL Toolbox)
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types.
Classify Images on FPGA Using Quantized Neural Network (Deep Learning HDL Toolbox)
This example shows how to use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network (CNN) to an FPGA.
Classify Images on FPGA by Using Quantized GoogLeNet Network (Deep Learning HDL Toolbox)
This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image.

Quantization for CPU Target

Generate int8 Code for Deep Learning Networks (MATLAB Coder)
Quantize and generate code for a pretrained convolutional neural network.
Generate INT8 Code for Deep Learning Network on Raspberry Pi (MATLAB Coder)
Generate code for deep learning network that performs inference computations in 8-bit integers.

Featured Examples

Prune Image Classification Network Using Taylor Scores

Reduce the size of a deep neural network using Taylor pruning. By using the taylorPrunableNetwork function to remove convolution layer filters, you can reduce the overall network size and increase the inference speed.

Open Live Script

Prune Filters in a Detection Network Using Taylor Scores

Reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.

Open Live Script

Compress Neural Network Using Projection

Compress a neural network using projection and principal component analysis.

Open Live Script

Quantize Residual Network Trained for Image Classification and Generate CUDA Code

Quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.

Open Live Script

Explore Quantized Semantic Segmentation Network Using Grad-CAM

Compare the predictions of a quantized semantic segmentation network to the original network using the gradient-weighted class activation mapping (Grad-CAM) interpretability method.

Open Live Script