Convergence of Stochastic Iterative Dynamic Programming Algorithms

作者: Tommi Jaakkola , Michael Jordan , Satinder Singh , None

DOI: 10.1162/NECO.1994.6.6.1185

关键词: Computer programmingStochastic processConvergence (routing)Class (set theory)Dynamic programmingReinforcement learningMathematicsMarkov processAlgorithmStochastic approximation

摘要: Recent developments in the area of reinforcement learning have yielded a number new algorithms for prediction and control Markovian environments. These algorithms, including TD(λ) algorithm Sutton (1988) Q-learning Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide rigorous proof convergence these DP-based by relating them powerful techniques stochastic approximation theory via theorem. The theorem establishes general class convergent which both belong.

参考文章(19)
Christopher J. C. H. Watkins, Peter Dayan, Technical Note : \cal Q -Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1007/BF00992698
Aryeh Dvoretzky, On Stochastic Approximation Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. pp. 39- 55 ,(1956)
J.N. Tsitsiklis, D.P. Bertsekas, Parallel and distributed computation Old Tappan, NJ (USA); Prentice Hall Inc.. ,(1989)
John N. Tsitsiklis, Dimitri P. Bertsekas, Parallel and Distributed Computation: Numerical Methods ,(1989)
Peter Dayan, Terrence J. Sejnowski, TD(λ) Converges with Probability 1 Machine Learning. ,vol. 14, pp. 295- 301 ,(1994) , 10.1023/A:1022657612745
Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586
Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh, Learning to act using real-time dynamic programming Artificial Intelligence. ,vol. 72, pp. 81- 138 ,(1995) , 10.1016/0004-3702(94)00011-O
Peter W. Glynn, Optimization of stochastic systems Proceedings of the 18th conference on Winter simulation - WSC '86. pp. 52- 59 ,(1986) , 10.1145/318242.318260