Chapter 21: Reinforcement Learning
Policy Gradient Theorem and Actor-Critic Architectures advanced
How direct policy optimization turns delayed, noisy rewards into learning signals for continuous trading actions.
How direct policy optimization turns delayed, noisy rewards into learning signals for continuous trading actions.
Register to Read
Sign up for a free account to access all 112 primer topics.
Create Free AccountAlready have an account? Sign in
References
Actor-Critic Algorithms
Vijay Konda, John Tsitsiklis
(1999)
— MIT Press
Recent Advances in Reinforcement Learning in Finance
Ben Hambly, Renyuan Xu, Huining Yang
(2023)
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
(2017)
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Richard S Sutton, David A. McAllester, Satinder P. Singh, Yishay Mansour, S. A. Solla, T. K. Leen, K. Müller
(2000)
— MIT Press