Technical Note : \cal Q -Learning

作者： Christopher J. C. H. Watkins , Peter Dayan

关键词:

摘要: \cal Q-learning (Watkins, 1989) is a simple way for agents to learn how act optimally in controlled Markovian domains. It amounts an incremental method dynamic programming which imposes limited computational demands. works by successively improving its evaluations of the quality particular actions at states. This paper presents and proves detail convergence theorem based on that outlined Watkins (1989). We show converges optimum action-values with probability 1 so long as all are repeatedly sampled states represented discretely. also sketch extensions cases non-discounted, but absorbing, Markov environments, where many Q values can be changed each iteration, rather than just one.

springer.com 本地加速

acm.org LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(4)

M. Sato, K. Abe, H. Takeda, Learning control of finite Markov chains with an explicit trade-off between estimation and control IEEE Transactions on Systems, Man, and Cybernetics. ,vol. 18, pp. 677- 684 ,(1988) , 10.1109/21.21595

Long-Ji Lin, Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching Machine Learning. ,vol. 8, pp. 293- 321 ,(1992) , 10.1007/BF00992699

Richard E. Bellman, Stuart E Dreyfus, Applied Dynamic Programming Princeton University Press. ,(1962) , 10.1515/9781400874651

Harold J. Kushner, Dean S. Clark, Stochastic Approximation Methods for Constrained and Unconstrained Systems Applied Mathematical Sciences. ,(1978) , 10.1007/978-1-4684-9352-8

Technical Note : \cal Q -Learning

来源期刊

我的账户

Technical Note : \cal Q -Learning

来源期刊

相似文章 10

我的账户