ON THE ALMOST SURE RATE OF CONVERGENCE OF TEMPORAL-DIFFERENCE LEARNING ALGORITHMS

作者: Vladislav B. Tadić

DOI: 10.3182/20020721-6-ES-1901.01147

关键词:

摘要: Abstract In this paper, the almost sure rate of convergence temporal-difference learning algorithms is analyzed. The analysis carried out for case discounted cost function associated with a Markov chain finite dimensional state-space. Under mild conditions, it shown that these converge at O ( n –1/2 (loglogn) 1/2 ) surely. Since O( characterizes in law iterated logarithm, obtained results could be considered as same algorithms. For reason, probably least conservative result kind. are illustrated examples related to random coefficient autoregression models and M/G /1 queues.

参考文章(2)
Peter Dayan, Terrence J. Sejnowski, TD(λ) Converges with Probability 1 Machine Learning. ,vol. 14, pp. 295- 301 ,(1994) , 10.1023/A:1022657612745
Tommi Jaakkola, Michael Jordan, Satinder Singh, None, Convergence of Stochastic Iterative Dynamic Programming Algorithms neural information processing systems. ,vol. 6, pp. 703- 710 ,(1993) , 10.1162/NECO.1994.6.6.1185