PAC Bounds for Multi-armed Bandit and Markov Decision Processes

作者: Eyal Even-Dar , Shie Mannor , Yishay Mansour

DOI: 10.1007/3-540-45435-7_18

关键词:

摘要: … for the multi-armed bandit problem we improve the de… sophisticated multi-armed bandit algorithm, one can improve the … Our algorithm uses any (ϵ, δ)-PAC Multi-armed bandit algorithm …

参考文章(22)
Christopher J. C. H. Watkins, Peter Dayan, Technical Note : \cal Q -Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1007/BF00992698
Joseph Mark Gani, I. Vincze, K. Sarkadi, Progress in statistics ,(1974)
Peter Bartlett, Martin M. Anthony, Learning in Neural Networks: Theoretical Foundations Cambridge University Press. ,(1999)
Martin Anthony, Peter L Bartlett, Peter L Bartlett, Neural Network Learning: Theoretical Foundations ,(1999)
Christopher J.C.H. Watkins, Peter Dayan, Technical Note Q-Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1023/A:1022676722315
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
Herbert Robbins, Some aspects of the sequential design of experiments Bulletin of the American Mathematical Society. ,vol. 58, pp. 527- 535 ,(1952) , 10.1090/S0002-9904-1952-09620-8
T.L Lai, Herbert Robbins, Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics. ,vol. 6, pp. 4- 22 ,(1985) , 10.1016/0196-8858(85)90002-8