PAC Bounds for Multi-armed Bandit and Markov Decision Processes

作者： Eyal Even-Dar , Shie Mannor , Yishay Mansour

关键词:

摘要: … for the multi-armed bandit problem we improve the de… sophisticated multi-armed bandit algorithm, one can improve the … Our algorithm uses any (ϵ, δ)-PAC Multi-armed bandit algorithm …

springer.com PDF 下载加速

psu.edu PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(22)

Christopher J. C. H. Watkins, Peter Dayan, Technical Note : \cal Q -Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1007/BF00992698

Joseph Mark Gani, I. Vincze, K. Sarkadi, Progress in statistics ,(1974)

R-max - a general polynomial time algorithm for near-optimal reinforcement learning Journal of Machine Learning Research. ,vol. 3, pp. 213- 231 ,(2003) , 10.1162/153244303765208377

Peter Bartlett, Martin M. Anthony, Learning in Neural Networks: Theoretical Foundations Cambridge University Press. ,(1999)

Martin Anthony, Peter L Bartlett, Peter L Bartlett, Neural Network Learning: Theoretical Foundations ,(1999)

Christopher J.C.H. Watkins, Peter Dayan, Technical Note Q-Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1023/A:1022676722315

John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)

Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8

Herman Chernoff, Sequential Analysis and Optimal Design ,(1987)

10.

T.L Lai, Herbert Robbins, Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics. ,vol. 6, pp. 4- 22 ,(1985) , 10.1016/0196-8858(85)90002-8

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

来源期刊

我的账户

PAC Bounds for Multi-armed Bandit and Markov Decision Processes

来源期刊

相似文章 10

我的账户