作者: Vladislav B. Tadić
DOI: 10.3182/20020721-6-ES-1901.01147
关键词:
摘要: Abstract In this paper, the almost sure rate of convergence temporal-difference learning algorithms is analyzed. The analysis carried out for case discounted cost function associated with a Markov chain finite dimensional state-space. Under mild conditions, it shown that these converge at O ( n –1/2 (loglogn) 1/2 ) surely. Since O( characterizes in law iterated logarithm, obtained results could be considered as same algorithms. For reason, probably least conservative result kind. are illustrated examples related to random coefficient autoregression models and M/G /1 queues.