Natural actor-critic algorithms

作者: Shalabh Bhatnagar , Richard S. Sutton , Mohammad Ghavamzadeh , Mark Lee

DOI: 10.1016/J.AUTOMATICA.2009.07.008

关键词: MathematicsAlgorithmTemporal difference learningMathematical optimizationReinforcement learningBellman equationGradient methodGradient descentFunction approximationStochastic gradient descentStochastic approximation

摘要: We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and function-approximation ideas, we provide their convergence proofs. Actor-critic methods are online approximations to policy iteration in which the value-function parameters estimated using temporal difference updated by stochastic gradient descent. Methods gradients this way of special interest because compatibility with methods, needed handle large or infinite state spaces. The use is many applications it dramatically reduces variance estimates. natural can produce better conditioned parameterizations has been shown further reduce some cases. Our results extend prior two-timescale for actor-critic Konda Tsitsiklis actor incorporating gradients. empirical studies Peters, Vijayakumar Schaal providing first proofs fully incremental algorithms.

参考文章(81)
John Rust, Chapter 14 Numerical dynamic programming in economics Handbook of Computational Economics. ,vol. 1, pp. 619- 729 ,(1996) , 10.1016/S1574-0021(96)01016-7
Geoffrey J. Gordon, Stable Function Approximation in Dynamic Programming Machine Learning Proceedings 1995. pp. 261- 268 ,(1995) , 10.1016/B978-1-55860-377-6.50040-2
J. Andrew Bagnell, Jeff Schneider, Covariant policy search international joint conference on artificial intelligence. pp. 1019- 1024 ,(2003) , 10.1184/R1/6552458.V1
Pierre Priouret, Michel Métivier, Albert Benveniste, Adaptive Algorithms and Stochastic Approximations ,(1990)
Richard Stuart Sutton, Temporal credit assignment in reinforcement learning University of Massachusetts Amherst. ,(1984)
Robert H. Crites, Andrew G. Barto, Elevator Group Control Using Multiple Reinforcement Learning Agents Machine Learning. ,vol. 33, pp. 235- 262 ,(1998) , 10.1023/A:1007518724497
J.N. Tsitsiklis, D.P. Bertsekas, Parallel and distributed computation Old Tappan, NJ (USA); Prentice Hall Inc.. ,(1989)