motivating actor-critic

Actor-critic methods were among the earliest to be investigated in RL. They were supplanted in the 90’s by action-value methods like Q-learning. These action-value methods directly model the value function at each state and choose actions based on the best values. This approach was appealing due to its simplicity, however it has been shown to have theoretical difficulties when combined with function approximation.

