rlTrainingOptions
Options for training reinforcement learning agents
Description
Use an rlTrainingOptions
object to specify options to train an
agent within an environment. Training options include the maximum number of episodes to train,
criteria for stopping training, criteria for saving agents, and options for using parallel
computing. To train the agent using the specified options, pass this object to train
.
For more information on training agents, see Train Reinforcement Learning Agents.
Creation
Description
returns the
default options for training a reinforcement learning agent.trainOpts
= rlTrainingOptions
creates the training option set trainOpts
= rlTrainingOptions(Name=Value
)trainOpts
and sets its Properties using one or more
name-value arguments.
Properties
MaxEpisodes
— Maximum number of episodes to train the agents
500
(default) | positive integer
This property is read-only.
Maximum number of episodes to train the agents, specified as a positive integer.
Regardless of other criteria for termination, training terminates after
MaxEpisodes
.
Example: MaxEpisodes=1000
MaxStepsPerEpisode
— Maximum number of environment steps to run per episode
500
(default) | positive integer
This property is read-only.
Maximum number of environment steps to run per episode, specified as a positive integer. In general, you define episode termination conditions in the environment. This value is the maximum number of steps to run in the episode if other termination conditions are not met.
Example: MaxStepsPerEpisode=1000
StopOnError
— Option to stop training when error occurs
"on"
(default) | "off"
Option to stop training when an error occurs during an episode, specified as
"on"
or "off"
. When this option is
"off"
, errors are captured and returned in the
SimulationInfo
output of train
, and training
continues to the next episode.
Example: StopOnError="off"
SimulationStorageType
— Storage type for environment data
"memory"
(default) | "file"
| "none"
Storage type for environment data, specified as "memory"
,
"file"
, or "none"
. This option specifies the
type of storage used for data generated during training or simulation by a Simulink® environment. Specifically, the software saves anything that appears as the
output of a sim
(Simulink) command.
Note that this option does not affect (and is not affected by) any option to save
agents during training specified within a training option object, or any data logged by
a FileLogger
or
MonitorLogger
object.
The default value is "memory"
, indicating that data is stored in an
internal memory variable. When you set this option to "file"
, data is
stored to disk, in MAT-files in the directory specified by the
SaveSimulationDirectory
property, and using the MAT-file
version specified by the SaveFileVersion
property. When you set
this option to "none"
, simulation data is not stored.
You can use this option to prevent out-of-memory issues during training or simulation.
Example: "none"
SaveSimulationDirectory
— Folder used to save environment data
"savedSims"
(default) | string | character vector
Folder used to save environment data, specified as a string or character vector. The
folder name can contain a full or relative path. When you set the
SimulationStorageType
property to "file"
,
the software saves data generated during training or simulation by a Simulink environment in MAT-files in this folder, using the MAT-file version
specified by the SaveFileVersion
property. If the folder does not
exist, the software creates it.
Example: "envSimData"
SaveFileVersion
— MAT-file version used to save environment data
"-v7"
(default) | "-v7.3"
| "-v6"
MAT-file version used to save environment data, specified as a string or character
vector. When you set the SimulationStorageType
property to
"file"
, the software saves data generated by a Simulink environment in MAT-files in the version specified by
SaveFileVersion
, in the folder specified by the
SaveSimulationDirectory
property. For more information, see
MAT-File Versions.
Example: Version="-v7.3"
ScoreAveragingWindowLength
— Window length for averaging
5
(default) | positive integer scalar | positive integer vector
Window length for averaging the scores, rewards, and number of steps for each agent, specified as a scalar or vector.
If the training environment contains a single agent, specify
ScoreAveragingWindowLength
as a scalar.
If the training environment is a multi-agent environment, specify a scalar to apply the same window length to all agents.
To use a different window length for each agent, specify
ScoreAveragingWindowLength
as a vector. In this case, the order
of the elements in the vector correspond to the order of the agents used during
environment creation.
For options expressed in terms of averages,
ScoreAveragingWindowLength
is the number of episodes included
in the average. For instance, if StopTrainingCriteria
is
"AverageReward"
, and StopTrainingValue
is
500
for a given agent, then for that agent, training terminates
when the average reward over the number of episodes specified in
ScoreAveragingWindowLength
equals or exceeds
500
. For the other agents, training continues until:
All agents reach their stop criteria.
The number of episodes reaches
MaxEpisodes
.You stop training by clicking the Stop Training button in Reinforcement Learning Training Monitor or pressing Ctrl-C at the MATLAB® command line.
Example: ScoreAveragingWindowLength=10
StopTrainingCriteria
— Training termination condition
"AverageSteps"
(default) | "None"
| "AverageReward"
| "EpisodeReward"
| "GlobalStepCount"
| "EpisodeCount"
| "EvaluationStatistic"
| "Custom"
| ...
Training termination condition, specified as one of the following strings:
"None"
— Do not stop training until the number of episodes reachesMaxEpisodes
."AverageSteps"
— Stop training when the running average number of steps per episode equals or exceeds the critical value specified by the optionStopTrainingValue
. The average is computed using the window'ScoreAveragingWindowLength'
."AverageReward"
— Stop training when the running average reward equals or exceeds the critical value."EpisodeReward"
— Stop training when the reward in the current episode equals or exceeds the critical value."GlobalStepCount"
— Stop training when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value."EpisodeCount"
— Stop training when the number of training episodes equals or exceeds the critical value."EvaluationStatistic"
— Stop training when the statistic returned by the evaluator object used withtrain
(if any) equals or exceeds the specified value."Custom"
— Stop training when the custom function specified inStopTrainingValue
returnstrue
.
Example: StopTrainingCriteria="AverageReward"
StopTrainingValue
— Critical value of training termination condition
500
(default) | scalar | vector | function name | function handle | anonymous function handle
This property is read-only.
Critical value of the training termination condition, specified as a scalar, vector or as a function name or handle.
You can use a custom stop criteria by specifying
StopTrainingValue
as a function name or handle. Your function
must have one input and one output, as shown in the following signature.
flag = myTerminationFcn(trainingStats)
Here, trainingStats
is a structure that contains the following
fields, all described in the trainStats
output argument of train
.
EpisodeIndex
EpisodeReward
EpisodeSteps
AverageReward
TotalAgentSteps
EpisodeQ0
SimulationInfo
EvaluationStatistics
TrainingOptions
The training stops when the specified function returns true
.
When not using a custom termination criteria, the following indications apply.
If the training environment contains a single agent, specify
StopTrainingValue
as a scalar. If the training environment is a
multi-agent environment, specify a scalar to apply the same termination criterion to all
agents. To use a different termination criterion for each agent, specify
StopTrainingValue
as a vector. In this case, the order of the
elements in the vector corresponds to the order of the agents used during environment
creation.
For a given agent, training ends when the termination condition specified by the
StopTrainingCriteria
option equals or exceeds this value. For
the other agents, the training continues until:
All agents reach their stop criteria.
The number of episodes reaches
maxEpisodes
.You stop training by clicking the Stop Training button in Reinforcement Learning Training Monitor or pressing Ctrl-C at the MATLAB command line.
For instance, if StopTrainingCriteria
is
"AverageReward"
and StopTrainingValue
is
100
for a given agent, then for that agent, training terminates
when the average reward over the number of episodes specified in
ScoreAveragingWindowLength
equals or exceeds
100
.
Example: StopTrainingValue=100
SaveAgentCriteria
— Condition for saving agents during training
"None"
(default) | "EpisodeReward"
| "AverageSteps"
| "AverageReward"
| "GlobalStepCount"
| "EpisodeCount"
| "EpisodeFrequency"
| "EvaluationStatistic"
| "Custom"
| ...
Condition for saving agents during training, specified as one of the following strings:
"None"
— Do not save any agents during training."EpisodeReward"
— Save all the agents when an agent reward in the current episode equals or exceeds the critical value specified inSaveTrainingValue
."AverageSteps"
— Save the agents when the running average number of steps per episode equals or exceeds the critical value specified by the optionSaveTrainingValue
. The average is computed using the window specified inScoreAveragingWindowLength
."AverageReward"
— Save the agents when the running average reward over all episodes equals or exceeds the critical value."GlobalStepCount"
— Save the agents when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value."EpisodeCount"
— Save the agents when the number of training episodes equals or exceeds the critical value."EpisodeFrequency"
— Save the agents with a period specified inSaveAgentValue
. For example, ifSaveAgentCriteria
is specified as"EpisodeFrequency"
andSaveAgentValue
is specified as10
, the agent is saved after every ten episodes."EvaluationStatistic"
— Save the agents when the statistic returned by the evaluator object used withtrain
(if any) equals or exceeds the specified value."Custom"
— Save the agents when the custom function specified inSaveAgentValue
returnstrue
.
Set this option to store candidate agents that perform well according to the criteria
you specify. When you set this option to a value other than "none"
,
the software sets the SaveAgentValue
option to 500. You can change
that value to specify the condition for saving the agent.
For instance, suppose you want to store for further testing any agent that yields an
episode reward that equals or exceeds 100. To do so, set
SaveAgentCriteria
to "EpisodeReward"
and set
the SaveAgentValue
option to 100. When an episode reward equals or
exceeds 100, train
saves the corresponding agent (or all the
corresponding agents for a multiagent environment) in a MAT-file in the folder specified
by the SaveAgentDirectory
option. The MAT-file is called
AgentK.mat
(or AgentsK.mat
for a multiagent
environment), where K
is the number of the corresponding episode. The
agents are stored within that MAT-file as the saved_agent
array. Note
that the MAT-file also includes the variable savedAgentResult
which
contains the training result information up to the corresponding episode.
Example: SaveAgentCriteria="EpisodeReward"
SaveAgentValue
— Critical value of condition for saving agents
"none"
(default) | 500 | scalar | vector | function name | function handle | anonymous function handle
This property is read-only.
Critical value of the condition for saving agents, specified as a scalar, vector or as a function name or handle.
You can use a custom save criteria by specifying SaveAgentValue
as a function name or handle. Your function must have one input and one output, as shown
in the following signature.
flag = mySaveFcn(trainingStats)
Here, trainingStats
is a structure that contains the following
fields, all described in the trainStats
output argument of train
.
EpisodeIndex
EpisodeReward
EpisodeSteps
AverageReward
TotalAgentSteps
EpisodeQ0
SimulationInfo
EvaluationStatistic
TrainingOptions
The training stops when the specified function returns true
.
When not using a custom termination criteria, the following indications apply.
If the training environment contains a single agent, specify
SaveAgentValue
as a scalar.
If the training environment is a multi-agent environment, specify a scalar to apply
the same saving criterion to each agent. To save the agents when one meets a particular
criterion, specify SaveAgentValue
as a vector. In this case, the
order of the elements in the vector corresponds to the order of the agents used when
creating the environment. When a criteria for saving an agent is met, all agents are
saved in the same MAT-file.
When you specify a condition for saving candidate agents using
SaveAgentCriteria
, the software sets this value to 500. Change
the value to specify the condition for saving the agent. See the
SaveAgentCriteria
option for more details.
Example: SaveAgentValue=100
SaveAgentDirectory
— Folder name for saved agents
"savedAgents"
(default) | string | character vector
Folder name for saved agents, specified as a string or character vector. The folder
name can contain a full or relative path. When an episode occurs in which the conditions
specified by the SaveAgentCriteria
and
SaveAgentValue
options are satisfied, the software saves the
current agent in a MAT-file in this folder. If the folder does not exist, the training
function creates it. When SaveAgentCriteria
is
"none"
, this option is ignored and no folder is created.
Example: SaveAgentDirectory = pwd + "\run1\Agents"
Verbose
— Option to display training progress at the command line
false
(0
) (default) | true
(1
)
Option to display training progress at the command line, specified as the logical
values false
(0
) or true
(1
). Set to true
to write information from
each training episode to the MATLAB command line during training.
Example: Verbose=true
Plots
— Option to display training progress with Reinforcement Learning Training Monitor
"training-progress"
(default) | "none"
Option to display training progress with Reinforcement Learning Training
Monitor, specified as "training-progress"
or
"none"
. By default, calling train
opens
Reinforcement Learning Training Monitor, which graphically and
numerically displays information about the training progress, such as the reward for
each episode, average reward, number of episodes, and total number of steps. For more
information, see train
. To
turn off this display, set this option to "none"
.
Example: Plots="none"
UseParallel
— Option to use parallel training
false
(default) | true
Option to use parallel training, specified as a logical
. Setting
this option to true
configures training to use multiple processes
(which can run on different cores, processors, computer clusters or cloud resources) to
simulate the environment. This option scales up the number of simulations with the
environment, and can speed up the generation of data for learning.
To specify options for parallel training, use the
ParallelizationOptions
property.
Note that if you want to speed up deep neural network calculations (such as gradient
computation, parameter update and prediction) using a local GPU, you do not need to set
UseParallel
to true. Instead, when creating your actor or
critic, set its UseDevice
option to "gpu"
instead of "cpu"
.
Using parallel computing or the GPU requires Parallel Computing Toolbox™ software. Using computer clusters or cloud resources additionally requires MATLAB Parallel Server™. For more information about training using multicore processors and GPUs, see Train Agents Using Parallel Computing and GPUs.
Example: UseParallel=true
ParallelizationOptions
— Options for parallel training
ParallelTraining
object
Options for parallel training, specified as a ParallelTraining
object. For more information about training using parallel computing, see Train Agents Using Parallel Computing and GPUs.
The ParallelTraining
object has the following properties, which you
can modify using dot notation after creating the rlTrainingOptions
object.
Mode
— Parallel computing mode
"sync"
(default) | "async"
Parallel computing mode, specified as one of the following:
"sync"
— Useparpool
to run synchronous training on the available workers. In this case, each worker pauses execution until all workers are finished. The parallel pool client updates the actor and critic parameters based on the results from all the workers and sends the updated parameters to all workers. When training a PG agent using gradient-based parallelizationMode
must be set to"sync"
."async"
— Useparpool
to run asynchronous training on the available workers. In this case, each worker sends its data back to the parallel pool client as soon as it finishes and then receives updated parameters from the client. The worker then continues with its task.
Example: Mode="async"
WorkerRandomSeeds
— Randomizer initialization for workers
–1
(default) | –2
| vector
Randomizer initialization for workers, specified as one of the following:
–1
— Assign a unique random seed to each worker. The value of the seed is the worker ID.–2
— Do not assign a random seed to the workers.Vector — Manually specify the random seed for each worker. The number of elements in the vector must match the number of workers.
Example: WorkerRandomSeeds=[1 2 3 4]
TransferBaseWorkspaceVariables
— Option to send model and workspace variables to parallel workers
"on"
(default) | "off"
Option to send model and workspace variables to parallel workers,
specified as "on"
or "off"
. When the
option is "on"
, the client sends to the workers the
variables defined in the base MATLAB workspace and used in the approximation models.
Example: TransferBaseWorkspaceVariables="off"
AttachedFiles
— Additional files to attach to the parallel pool
[]
(default) | string | string array
Additional files to attach to the parallel pool, specified as a string or string array.
Example: AttachedFiles="myInitFile.m"
SetupFcn
— Function to run before training starts
[]
(default) | function handle
Function to run before training starts, specified as a handle to a function having no input arguments. This function is run once per worker before training begins. Write this function to perform any processing that you need prior to training.
Example: AttachedFiles=@mySetupFcn
CleanupFcn
— Function to run after training ends
[]
(default) | function handle
Function to run after training ends, specified as a handle to a function having no input arguments. You can write this function to clean up the workspace or perform other processing after training terminates.
Example: AttachedFiles=@myCleanupFcn
Object Functions
train | Train reinforcement learning agents within a specified environment |
Examples
Configure Options for Training
Create an options set for training a reinforcement learning agent. Set the maximum number of episodes and the maximum number of steps per episode to 1000. Configure the options to stop training when the average reward equals or exceeds 480, and turn on both the command-line display and Reinforcement Learning Training Monitor for displaying training results. You can set the options using name-value pair arguments when you create the options set. Any options that you do not explicitly set have their default values.
trainOpts = rlTrainingOptions(... MaxEpisodes=1000,... MaxStepsPerEpisode=1000,... StopTrainingCriteria="AverageReward",... StopTrainingValue=480,... Verbose=true,... Plots="training-progress")
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 StopOnError: "on" SimulationStorageType: "memory" SaveSimulationDirectory: "savedSims" SaveFileVersion: "-v7" ScoreAveragingWindowLength: 5 StopTrainingCriteria: "AverageReward" StopTrainingValue: 480 SaveAgentCriteria: "none" SaveAgentValue: "none" SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress" UseParallel: 0 ParallelizationOptions: [1x1 rl.option.ParallelTraining]
Alternatively, create a default options set and use dot notation to change some of the values.
trainOpts = rlTrainingOptions; trainOpts.MaxEpisodes = 1000; trainOpts.MaxStepsPerEpisode = 1000; trainOpts.StopTrainingCriteria = "AverageReward"; trainOpts.StopTrainingValue = 480; trainOpts.Verbose = true; trainOpts.Plots = "training-progress"; trainOpts
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 StopOnError: "on" SimulationStorageType: "memory" SaveSimulationDirectory: "savedSims" SaveFileVersion: "-v7" ScoreAveragingWindowLength: 5 StopTrainingCriteria: "AverageReward" StopTrainingValue: 480 SaveAgentCriteria: "none" SaveAgentValue: "none" SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress" UseParallel: 0 ParallelizationOptions: [1x1 rl.option.ParallelTraining]
You can now use trainOpts
as an input argument to the train
command.
Configure Parallel Computing Options for Training
To turn on parallel computing for training a reinforcement learning agent, set the UseParallel
training option to true
.
trainOpts = rlTrainingOptions(UseParallel=true);
To configure your parallel training, configure the fields of the trainOpts.ParallelizationOptions
. For example, specify the asynchronous training mode:
trainOpts.ParallelizationOptions.Mode = "async";
trainOpts.ParallelizationOptions
ans = ParallelTraining with properties: Mode: "async" WorkerRandomSeeds: -1 TransferBaseWorkspaceVariables: "on" AttachedFiles: [] SetupFcn: [] CleanupFcn: []
You can now use trainOpts
as an input argument to the train
command to perform training with parallel computing.
Configure Options for A3C Training
To train an agent using the asynchronous advantage actor-critic (A3C) method, you must set the agent and parallel training options appropriately.
When creating the AC agent, set the NumStepsToLookAhead
value to be greater than 1
. Common values are 64
and 128
.
agentOpts = rlACAgentOptions(NumStepsToLookAhead=64);
Use agentOpts
when creating your agent. Alternatively, create your agent first and then modify its options, including the actor and critic options later using dot notation.
Configure the training algorithm to use asynchronous parallel training.
trainOpts = rlTrainingOptions(UseParallel=true);
trainOpts.ParallelizationOptions.Mode = "async";
You can now use trainOpts
to train your AC agent using the A3C method.
For an example on asynchronous advantage actor-critic agent training, see Train AC Agent to Balance Discrete Cart-Pole System Using Parallel Computing.
Version History
Introduced in R2019aR2022a: Training Parallelization Options: DataToSendFromWorkers
and StepsUntilDataIsSent
properties are no longer active
The property DataToSendFromWorkers
of the
ParallelizationOptions
object is no longer active and will be removed in
a future release. The data sent from the workers to the learner is now automatically
determined based on agent type.
The property StepsUntilDataIsSent
of the
ParallelizationOptions
object is no longer active and will be removed in
a future release. Data is now sent from the workers to the learner at the end each
episode.
R2022a: rlTrainingOptions
is not recommended for multi agent training
rlTrainingOptions
is not recommended to concurrently train agents in a
multi-agent environments. Use rlMultiAgentTrainingOptions
instead.
rlMultiAgentTrainingOptions
is specifically built for multi-agent
reinforcement learning, and allows you to group agents according to a common learning
strategy and specify whether their learning is centralized (that is all agents in a group
share experiences) or decentralized (agents do not share experiences), whereas
rlTrainingOptions
only allows for decentralized learning.
See Also
Functions
Objects
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other bat365 country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)