Fast gradient-descent methods for temporal-difference learning with linear function approximation

作者: Richard S. Sutton , Hamid Reza Maei , Doina Precup , Shalabh Bhatnagar , David Silver

DOI: 10.1145/1553374.1553501

关键词:

摘要: … In this section we derive two new algorithms as stochastic gradient descent in the projected Bellman error objective (5). We first establish some relationships between the relevant …

参考文章(19)
Alborz Geramifard, Richard S. Sutton, Michael Bowling, Incremental least-squares temporal difference learning national conference on artificial intelligence. pp. 356- 361 ,(2006)
Csaba Szepesvári, Hamid Reza Maei, Richard S. Sutton, A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation neural information processing systems. pp. 1609- 1616 ,(2008)
Nathan R. Sturtevant, Adam M. White, Feature construction for reinforcement learning in hearts annual conference on computers. pp. 122- 134 ,(2006) , 10.1007/978-3-540-75538-8_11
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta, Off-Policy Temporal Difference Learning with Function Approximation international conference on machine learning. pp. 417- 424 ,(2001)
Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X
Richard Sutton, Martin Müller, David Silver, Reinforcement learning of local shape in the game of go international joint conference on artificial intelligence. pp. 1053- 1058 ,(2007)
Morris W. Hirsch, Convergent activation dynamics in continuous time networks Neural Networks. ,vol. 2, pp. 331- 349 ,(1989) , 10.1016/0893-6080(89)90018-X
E. Barnard, Temporal-difference methods and Markov models IEEE Transactions on Systems, Man, and Cybernetics. ,vol. 23, pp. 357- 365 ,(1993) , 10.1109/21.229449
V. S. Borkar, S. P. Meyn, The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning SIAM Journal on Control and Optimization. ,vol. 38, pp. 447- 469 ,(2000) , 10.1137/S0363012997331639
Steven J. Bradtke, Andrew G. Barto, Linear least-squares algorithms for temporal difference learning Machine Learning. ,vol. 22, pp. 33- 57 ,(1996) , 10.1007/BF00114723