Chapter 21: Reinforcement Learning

Policy Gradient Theorem and Actor-Critic Architectures advanced

How direct policy optimization turns delayed, noisy rewards into learning signals for continuous trading actions.

How direct policy optimization turns delayed, noisy rewards into learning signals for continuous trading actions.

Register to Read

Sign up for a free account to access all 112 primer topics.

Create Free Account

Already have an account? Sign in

References

Actor-Critic Algorithms
Vijay Konda, John Tsitsiklis (1999) — MIT Press
Recent Advances in Reinforcement Learning in Finance
Ben Hambly, Renyuan Xu, Huining Yang (2023)
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov (2017)
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Richard S Sutton, David A. McAllester, Satinder P. Singh, Yishay Mansour, S. A. Solla, T. K. Leen, K. Müller (2000) — MIT Press