Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud
Training deep networks is computationally intensive and can take many hours of computing time; however, neural networks are inherently parallel algorithms. You can take advantage of this parallelism by running in parallel using high-performance GPUs and computer clusters.
It is recommended to train using a GPU or multiple GPUs. Only use single CPU or multiple CPUs if you do not have a GPU. CPUs are normally much slower that GPUs for both training and inference. Running on a single GPU typically offers much better performance than running on multiple CPU cores.
If you do not have a suitable GPU, you can rent high-performance GPUs and clusters in the cloud. For more information on how to access MATLAB® in the cloud for deep learning, see Deep Learning in the Cloud.
Using a GPU or parallel options requires Parallel Computing Toolbox™. Using a GPU also requires a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). Using a remote cluster also requires MATLAB Parallel Server™.
Tip
For trainnet
and trainNetwork
workflows,
GPU support is automatic. By default, the trainnet
and
trainNetwork
functions use a GPU if one is available. If
you have access to a machine with multiple GPUs, specify the
ExecutionEnvironment
training option as
"multi-gpu"
.
To run custom training workflows, including dlnetwork
workflows,
on the GPU, use minibatchqueue
to automatically convert data to
gpuArray
objects.
You can use parallel resources to scale up deep learning for a single network. You can also train multiple networks simultaneously. The following sections show the available options for deep learning in parallel in MATLAB:
Note
If you run MATLAB on a single remote machine for example, a cloud machine that you connect to via ssh or remote desktop protocol, then follow the steps for local resources. For more information on connecting to cloud resources, see Deep Learning in the Cloud.
Train Single Network in Parallel
Use Local Resources to Train Single Network in Parallel
The following table shows you the available options for training and inference with single network on your local workstation.
Resource |
trainnet and
trainNetwork Workflows | Custom Training Workflows | Required Products |
---|---|---|---|
Single CPU | Automatic if no GPU is available. Training using a single CPU is not recommended. | Training using a single CPU is not recommended. |
|
Multiple CPU cores | Training using multiple CPU cores is not recommended if you have access to a GPU. | Training using multiple CPU cores is not recommended if you have access to a GPU. |
|
Single GPU | Automatic. By default, training and inference run on the GPU if one is available. Alternatively,
specify the | Use For an example, see Train Network Using Custom Training Loop. | |
Multiple GPUs | Specify the For an example, see Train Network Using Automatic Multi-GPU Support. | Start a local parallel pool with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use For an example, see Train Network in Parallel with Custom Training Loop. Set the |
Use Remote Cluster Resources to Train Single Network in Parallel
The following table shows you the available options for training and inference with single network on a remote cluster.
Resource | trainnet Workflows |
trainNetwork Workflows | Custom Training Workflows | Required Products |
---|---|---|---|---|
Any | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If the pool has access to GPUs, then only workers with a unique GPU perform training computation and excess workers become idle. If the pool does not have GPUs, then training takes place on all available CPU workers instead. | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If the pool has access to GPUs, then only workers with a unique GPU perform training computation and excess workers become idle. If the pool does not have GPUs, then training takes place on all available CPU workers instead. | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Use The software, by default, performs calculations using only the CPU. For an example, see Train Network in Parallel with Custom Training Loop. Set the |
|
Multiple CPUs | Training using multiple CPU cores is not recommended if you have access to a GPU. Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If the pool has access to GPUs, the GPUs will not be used. | Training using multiple CPU cores is not recommended if you have access to a GPU. | Training using multiple CPU cores is not recommended if you have access to a GPU. | |
Multiple GPUs | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If you
use the If you use
the | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Specify the
If the pool has access to GPUs, then only workers with a unique GPU perform training computation and excess workers become idle. If the pool does not have GPUs, then training takes place on all available CPU workers instead. For an example, see Train Network in the Cloud Using Automatic Parallel Support. | Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use For an example, see Train Network in Parallel with Custom Training Loop. Set the |
Use Deep Network Designer and Experiment Manager to Train Single Network in Parallel
You can train a single network in parallel using Deep Network Designer. You can train using local resources or a remote cluster.
To train locally using multiple GPUs, set the
ExectionEnvironment
option tomulti-gpu
in the Training Options dialog.To train using a remote cluster, set the
ExectionEnvironment
option toparallel
in the Training Options dialog. If there is no current parallel pool, the software starts one using the default cluster profile. If the pool has access to GPUs, then only workers with a unique GPU perform training computation. If the pool does not have GPUs, then training takes place on all available CPU workers instead.
You can use Experiment Manager to run a single trial using multiple parallel workers. For more information, see Use Experiment Manager to Train Networks in Parallel.
Train Multiple Networks in Parallel
Use Local or Remote Cluster Resources to Train Multiple Network in Parallel
To train multiple networks in parallel, train each network on a different parallel worker. You can modify the network or training parameters on each worker to perform parameter sweeps in parallel.
Use parfor
(Parallel Computing Toolbox) or parfeval
(Parallel Computing Toolbox) to train a single
network on each worker. To run in the background without blocking your local
MATLAB, use parfeval
. You can plot results using the
OutputFcn
training option.
You can run locally or using a remote cluster. Using a remote cluster requires MATLAB Parallel Server.
Resource |
trainnet and
trainNetwork Workflows | Custom Training Workflows | Required Products |
---|---|---|---|
Multiple CPUs | Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Use For examples, see
| Specify the desired cluster as your default cluster profile. For more information, see Manage Cluster Profiles and Automatic Pool Creation. Use |
|
Multiple GPUs | Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use For examples, see
| Start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs. Use Convert each mini-batch of data to
|
Use Experiment Manager to Train Multiple Networks in Parallel
You can use Experiment Manager to run trials on multiple parallel workers simultaneously. Set up your parallel environment and enable the Use Parallel option before running your experiment. Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool. For more information, see Use Experiment Manager to Train Networks in Parallel.
Batch Deep Learning
You can offload deep learning computations to run in the background using the
batch
(Parallel Computing Toolbox) function. This means that
you can continue using MATLAB while your computation runs in the background, or you can close your
client MATLAB and fetch results later.
You can run batch jobs in a local or remote cluster. To offload your deep learning
computations, use batch
to submit a script or function that
runs in the cluster. You can perform any kind of deep learning computation as a
batch job, including parallel computations. For an example, see Send Deep Learning Batch Job to Cluster.
When you submit a batch job as a script, by default, workspace variables are copied from the client to the workers. To avoid copying workspace variables to the workers, submit batch jobs as functions.
To run in parallel, use a script or function that contains the same code that you
would use to run in parallel locally or in a cluster. For example, your script or
function can run trainnet
or trainNetwork
with the ExecutionEnvironment
training option set to
"parallel"
, or run a custom training loop in parallel. Use
batch
to submit the script or function to the cluster and
use the Pool
option to specify the number of workers you want to
use. For more information on running parallel computations with
batch
, see Run Batch Parallel Jobs (Parallel Computing Toolbox).
To run deep learning computation on multiple networks, it is recommended to submit a single batch job for each network. Doing so avoids the overhead required to start a parallel pool in the cluster and allows you to use the job monitor to observe the progress of each network computation individually.
You can submit multiple batch jobs. If the submitted jobs require more workers than are currently available in the cluster, then later jobs are queued until earlier jobs have finished. Queued jobs start when enough workers are available to run the job.
The default search paths of the workers might not be the same as that of your
client MATLAB. To ensure that workers in the cluster have access to the needed
files, such as code files, data files, or model files, specify paths to add to
workers using the AdditionalPaths
option.
To retrieve results after the job is finished, use the fetchOutputs
(Parallel Computing Toolbox) function.
fetchOutputs
retrieves all variables in the batch worker
workspace. When you submit batch jobs as a script, by default, workspace variables
are copied from the client to workers. To avoid recursion of workspace variables,
submit batch jobs as functions instead of as scripts.
You can use the diary
(Parallel Computing Toolbox) to capture command line
output while running batch jobs. This can be useful when executing the
trainnet
or trainNetwork
function with
the Verbose
option set to true
.
Manage Cluster Profiles and Automatic Pool Creation
Parallel Computing Toolbox comes pre-configured with the cluster profile
Processes
for running parallel code on your local desktop
machine. By default, MATLAB starts all parallel pools using the Processes
cluster profile. If you want to run code on a remote cluster, you must start a
parallel pool using the remote cluster profile. You can manage cluster profiles
using the Cluster Profile Manager. For more information about managing cluster
profiles, see Discover Clusters and Use Cluster Profiles (Parallel Computing Toolbox).
Some functions, including trainnet
,
trainNetwork
, predict
,
classify
, parfor
, and
parfeval
can automatically start a parallel pool. To take
advantage of automatic parallel pool creation, set your desired cluster as the
default cluster profile in the Cluster Profile Manager. Alternatively, you can
create the pool manually and specify the desired cluster resource when you create
the pool.
If you want to use multiple GPUs in a remote cluster to train multiple networks in parallel or for custom training loops, best practice is to manually start a parallel pool in the desired cluster with as many workers as available GPUs. For more information, see Deep Learning with MATLAB on Multiple GPUs.
Deep Learning Precision
For best performance, it is recommended to use a GPU for all deep learning workflows. Because single-precision and double-precision performance of GPUs can differ substantially, it is important to know in which precision computations are performed. Typically, GPUs offer much better performance for calculations in single precision.
If you only use a GPU for deep learning, then single-precision performance is one of the most important characteristics of a GPU. If you also use a GPU for other computations using Parallel Computing Toolbox, then high double-precision performance is important. This is because many functions in MATLAB use double-precision arithmetic by default. For more information, see Perform Calculations in Single Precision (Parallel Computing Toolbox)
When you train a neural network using the trainnet
or
trainNetwork
functions, or when you use prediction or validation functions
with DAGNetwork
and
SeriesNetwork
objects, the software performs these computations using single-precision, floating-point
arithmetic. Functions for prediction and validation include predict
,
classify
, and
activations
.
The software uses single-precision arithmetic when you train neural networks using both CPUs
and GPUs.
For custom training workflows, it is recommended to convert data to single
precision for training and inference. If you use minibatchqueue
to manage mini-batches, your data is converted to single precision by default.
Reproducibility
To provide the best performance, deep learning using a GPU in MATLAB is not guaranteed to be deterministic. Depending on your network architecture, under some conditions you might get different results when using a GPU to train two identical networks or make two predictions using the same network and data.
See Also
trainingOptions
| minibatchqueue
| trainnet
| trainNetwork
| Deep Network
Designer | Experiment
Manager
Related Topics
- Deep Learning with MATLAB on Multiple GPUs
- Resolve GPU Memory Issues
- Run MATLAB using GPUs in the Cloud (Parallel Computing Toolbox)
- Deep Learning with Big Data
- Deep Learning in the Cloud
- Train Deep Learning Networks in Parallel
- Send Deep Learning Batch Job to Cluster
- Work with Deep Learning Data in the Cloud
- Use Experiment Manager to Train Networks in Parallel