Toward Off-Policy Learning Control with Function Approximation

作者: Hamid R. Maei , Richard S. Sutton , Shalabh Bhatnagar , Csaba Szepesv ri

DOI:

关键词:

摘要: We present the first temporal-difference learning algorithm for off-policy control with unrestricted linear function approximation whose per-time-step complexity is in number of features. Our algorithm, Greedy-GQ, an extension recent work on gradient learning, which has hitherto been restricted to a prediction (policy evaluation) setting, setting target policy greedy respect optimal action-value function. A limitation our that we require behavior be stationary. call this latent because policy, though learned, not manifest behavior. Popular algorithms such as Q-learning are known unstable when used approximation.

参考文章(19)
Csaba Szepesvári, Hamid Reza Maei, Richard S. Sutton, A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation neural information processing systems. pp. 1609- 1616 ,(2008)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Geoffrey J. Gordon, Stable Function Approximation in Dynamic Programming Machine Learning Proceedings 1995. pp. 261- 268 ,(1995) , 10.1016/B978-1-55860-377-6.50040-2
Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X
Guisheng Zhai, Xuping Xu, Hai Lin, Anthony N. Michel, Analysis and design of switched normal systems Nonlinear Analysis-theory Methods & Applications. ,vol. 65, pp. 2248- 2259 ,(2006) , 10.1016/J.NA.2006.01.034
Hamid Reza Maei, Richard S. Sutton, GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces artificial general intelligence. pp. 100- 105 ,(2010) , 10.2991/AGI.2010.22
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 993- 1000 ,(2009) , 10.1145/1553374.1553501