Policies and Value Functions
A reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. A value function is a mapping from an environment observation (or observation-action pair) to the value (the expected cumulative long-term reward) of a policy. During training, the agent tunes the parameters of its policy and value function approximators to maximize the long-term reward.
Reinforcement Learning Toolbox™ software provides approximator objects for actors and critics. The actor learns the policy that selects the best action to take. The critic learns the value (or Q-value) function that estimates the value of the current policy. Depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. For more information, see Create Policies and Value Functions.
Blocks
Policy | Reinforcement learning policy |
Functions
Topics
- Create Policies and Value Functions
Specify policies and value functions using function approximators, such as deep neural networks.
- Import Neural Network Models
You can import existing policies from other deep learning frameworks using the ONNX™ model format.