ON THE ALMOST SURE RATE OF CONVERGENCE OF TEMPORAL-DIFFERENCE LEARNING ALGORITHMS

作者： Vladislav B. Tadić

DOI: 10.3182/20020721-6-ES-1901.01147

关键词:

摘要: Abstract In this paper, the almost sure rate of convergence temporal-difference learning algorithms is analyzed. The analysis carried out for case discounted cost function associated with a Markov chain finite dimensional state-space. Under mild conditions, it shown that these converge at O ( n –1/2 (loglogn) 1/2 ) surely. Since O( characterizes in law iterated logarithm, obtained results could be considered as same algorithms. For reason, probably least conservative result kind. are illustrated examples related to random coefficient autoregression models and M/G /1 queues.

sciencedirect.com 本地加速

sci-hub.se PDF 下载加速

参考文章(2)

Peter Dayan, Terrence J. Sejnowski, TD(λ) Converges with Probability 1 Machine Learning. ,vol. 14, pp. 295- 301 ,(1994) , 10.1023/A:1022657612745

Tommi Jaakkola, Michael Jordan, Satinder Singh, None, Convergence of Stochastic Iterative Dynamic Programming Algorithms neural information processing systems. ,vol. 6, pp. 703- 710 ,(1993) , 10.1162/NECO.1994.6.6.1185

ON THE ALMOST SURE RATE OF CONVERGENCE OF TEMPORAL-DIFFERENCE LEARNING ALGORITHMS

来源期刊

我的账户

ON THE ALMOST SURE RATE OF CONVERGENCE OF TEMPORAL-DIFFERENCE LEARNING ALGORITHMS

来源期刊

相似文章 0

我的账户