Continuous upper confidence trees

作者: Adrien Couëtoux , Jean-Baptiste Hoock , Nataliya Sokolovska , Olivier Teytaud , Nicolas Bonnard

DOI: 10.1007/978-3-642-25566-3_32

关键词:

摘要: Upper Confidence Trees are a very efficient tool for solving Markov Decision Processes; originating in difficult games like the game of Go, it is particular surprisingly high dimensional problems. It known that can be adapted to continuous domains some cases (in action spaces). We here present an extension stochastic (i) show deceptive problem on which classical Tree approach does not work, even with arbitrarily large computational power and progressive widening (ii) propose improvement, termed double-progressive widening, takes care compromise between variance (we want infinitely many simulations each action/state) bias sufficiently nodes avoid by first nodes) extends (iii) discuss its consistency experimentally performs well experimental benchmarks. guess trick used other algorithms as well, general ensuring good bias/variance search algorithms.

参考文章(14)
Hootan Nakhost, Martin Müller, Monte-Carlo exploration for deterministic planning international joint conference on artificial intelligence. pp. 1766- 1771 ,(2009)
Guido Sanguinetti, Neil D. Lawrence, Missing Data in Kernel PCA Lecture Notes in Computer Science. pp. 751- 758 ,(2006) , 10.1007/11871842_76
Rémi Coulom, COMPUTING “ELO RATINGS” OF MOVE PATTERNS IN THE GAME OF GO computer games. ,vol. 30, pp. 198- 208 ,(2007) , 10.3233/ICG-2007-30403
Levente Kocsis, Csaba Szepesvári, Bandit Based Monte-Carlo Planning Lecture Notes in Computer Science. pp. 282- 293 ,(2006) , 10.1007/11871842_29
Rémi Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search annual conference on computers. pp. 72- 83 ,(2006) , 10.1007/978-3-540-75538-8_7
Philippe Rolet, Michèle Sebag, Olivier Teytaud, Optimal robust expensive optimization is tractable Proceedings of the 11th Annual conference on Genetic and evolutionary computation - GECCO '09. pp. 1951- 1956 ,(2009) , 10.1145/1569901.1570255
T.L Lai, Herbert Robbins, Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics. ,vol. 6, pp. 4- 22 ,(1985) , 10.1016/0196-8858(85)90002-8
Sylvain Gelly, David Silver, Combining online and offline knowledge in UCT international conference on machine learning. pp. 273- 280 ,(2007) , 10.1145/1273496.1273531
GUILLAUME M. J-B. CHASLOT, MARK H. M. WINANDS, H. JAAP VAN DEN HERIK, JOS W. H. M. UITERWIJK, BRUNO BOUZY, Progressive Strategies for Monte-Carlo Tree Search New Mathematics and Natural Computation. ,vol. 4, pp. 343- 357 ,(2008) , 10.1142/S1793005708001094
Peter Auer, Using confidence bounds for exploitation-exploration trade-offs Journal of Machine Learning Research. ,vol. 3, pp. 397- 422 ,(2003)