Convergence of Stochastic Iterative Dynamic Programming Algorithms

作者： Tommi Jaakkola , Michael Jordan , Satinder Singh , None

DOI: 10.1162/NECO.1994.6.6.1185

关键词: Computer programming 、 Stochastic process 、 Convergence (routing) 、 Class (set theory) 、 Dynamic programming 、 Reinforcement learning 、 Mathematics 、 Markov process 、 Algorithm 、 Stochastic approximation

摘要: Recent developments in the area of reinforcement learning have yielded a number new algorithms for prediction and control Markovian environments. These algorithms, including TD(λ) algorithm Sutton (1988) Q-learning Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide rigorous proof convergence these DP-based by relating them powerful techniques stochastic approximation theory via theorem. The theorem establishes general class convergent which both belong.

参考文章(19)

Christopher J. C. H. Watkins, Peter Dayan, Technical Note : \cal Q -Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1007/BF00992698

Aryeh Dvoretzky, On Stochastic Approximation Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics. pp. 39- 55 ,(1956)

Dimitri P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models ,(1987)

J.N. Tsitsiklis, D.P. Bertsekas, Parallel and distributed computation Old Tappan, NJ (USA); Prentice Hall Inc.. ,(1989)

John N. Tsitsiklis, Dimitri P. Bertsekas, Parallel and Distributed Computation: Numerical Methods ,(1989)

Peter Dayan, Terrence J. Sejnowski, TD(λ) Converges with Probability 1 Machine Learning. ,vol. 14, pp. 295- 301 ,(1994) , 10.1023/A:1022657612745

Sheldon M. Ross, Applied Probability Models with Optimization Applications ,(1970)

Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586

Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh, Learning to act using real-time dynamic programming Artificial Intelligence. ,vol. 72, pp. 81- 138 ,(1995) , 10.1016/0004-3702(94)00011-O

10.

Peter W. Glynn, Optimization of stochastic systems Proceedings of the 18th conference on Winter simulation - WSC '86. pp. 52- 59 ,(1986) , 10.1145/318242.318260

Convergence of Stochastic Iterative Dynamic Programming Algorithms

来源期刊

我的账户

Convergence of Stochastic Iterative Dynamic Programming Algorithms

来源期刊

相似文章 10

我的账户