bat365 Homepage

Policies and Value Functions

Define policy and value function approximators, such as actors and critics

A reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. A value function is a mapping from an environment observation (or observation-action pair) to the value (the expected cumulative long-term reward) of a policy. During training, the agent tunes the parameters of its policy and value function approximators to maximize the long-term reward.

Reinforcement Learning Toolbox™ software provides approximator objects for actors and critics. The actor learns the policy that selects the best action to take. The critic learns the value (or Q-value) function that estimates the value of the current policy. Depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. For more information, see Create Policies and Value Functions.

Blocks

Policy

Reinforcement learning policy

Functions

expand all

Create Actors and Critics

`rlTable`	Value table or Q table
`rlValueFunction`	Value function approximator object for reinforcement learning agents
`rlQValueFunction`	Q-Value function approximator object for reinforcement learning agents
`rlVectorQValueFunction`	Vector Q-value function approximator for reinforcement learning agents
`rlContinuousDeterministicActor`	Deterministic actor with a continuous action space for reinforcement learning agents
`rlDiscreteCategoricalActor`	Stochastic categorical actor with a discrete action space for reinforcement learning agents
`rlContinuousGaussianActor`	Stochastic Gaussian actor with a continuous action space for reinforcement learning agents

Get and Set Actors and Critics from and to Agents

`getActor`	Extract actor from reinforcement learning agent
`setActor`	Set actor of reinforcement learning agent
`getCritic`	Extract critic from reinforcement learning agent
`setCritic`	Set critic of reinforcement learning agent

Get and Set Approximation Models and Learnable Parameters

`getModel`	Get function approximator model from actor or critic
`setModel`	Set function approximation model for actor or critic
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object

Training Options for Actors and Critics

rlOptimizerOptions Optimization options for actors and critics

Extract Policy Objects from Agents

`getGreedyPolicy`	Extract greedy (deterministic) policy object from agent
`getExplorationPolicy`	Extract exploratory (stochastic) policy object from agent

Create Policy Objects for Custom Training and Deployment

`rlMaxQPolicy`	Policy object to generate discrete max-Q actions for custom training loops and application deployment
`rlEpsilonGreedyPolicy`	Policy object to generate discrete epsilon-greedy actions for custom training loops
`rlDeterministicActorPolicy`	Policy object to generate continuous deterministic actions for custom training loops and application deployment
`rlAdditiveNoisePolicy`	Policy object to generate continuous noisy actions for custom training loops
`rlStochasticActorPolicy`	Policy object to generate stochastic actions for custom training loops and application deployment

Get Actions and Values

`getAction`	Obtain action from agent, actor, or policy object given environment observations
`getValue`	Obtain estimated value from a critic given environment observations and actions
`getMaxQValue`	Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
`evaluate`	Evaluate function approximator object given observation (or observation-action) input data
`gradient`	Evaluate gradient of function approximator object given observation and action input data
`accelerate`	Option to accelerate computation of gradient for approximator object based on neural network

Deep Neural Network Layers

`quadraticLayer`	Quadratic layer for actor or critic network
`scalingLayer`	Scaling layer for actor or critic network
`softplusLayer`	Softplus layer for actor or critic network
`featureInputLayer`	Feature input layer
`reluLayer`	Rectified Linear Unit (ReLU) layer
`tanhLayer`	Hyperbolic tangent (tanh) layer
`fullyConnectedLayer`	Fully connected layer
`lstmLayer`	Long short-term memory (LSTM) layer for recurrent neural network (RNN)
`softmaxLayer`	Softmax layer

Topics

Create Policies and Value Functions
Specify policies and value functions using function approximators, such as deep neural networks.
Import Neural Network Models
You can import existing policies from other deep learning frameworks using the ONNX™ model format.