Bellman equations occur in dynamic programming. A Bellman equation is also called an optimality equation or a dynamic programming equation. This approach was developed by Richard Bellman.
In reinforcement learning a Bellman equation refers to a recursion for expected rewards. For example, the expected reward for being in a particular state s and following some fixed policy π has the Bellman equation:
| Vπ(s) = R(s) + γ | ∑ | P(s' | s,π(s))Vπ(s') |
| s' | |
while the equation for the optimal policy is referred to as the Bellman optimality equation:
| V * (s) = R(s) + maxaγ | ∑ | P(s' | s,a)Vπ(s') |
| s' | |
the difference being that rather than taking the action prescribed by some policy π, we take the action that gives the best expected return.
Example
The recursive Bellman equation used to find a maximum of the dynamic programming problem:
such that
can be written as:
.
Here
is dependent on the state x, and
- y(x)
is the policy function .