作者: Adrien Couëtoux , Jean-Baptiste Hoock , Nataliya Sokolovska , Olivier Teytaud , Nicolas Bonnard
DOI: 10.1007/978-3-642-25566-3_32
关键词:
摘要: Upper Confidence Trees are a very efficient tool for solving Markov Decision Processes; originating in difficult games like the game of Go, it is particular surprisingly high dimensional problems. It known that can be adapted to continuous domains some cases (in action spaces). We here present an extension stochastic (i) show deceptive problem on which classical Tree approach does not work, even with arbitrarily large computational power and progressive widening (ii) propose improvement, termed double-progressive widening, takes care compromise between variance (we want infinitely many simulations each action/state) bias sufficiently nodes avoid by first nodes) extends (iii) discuss its consistency experimentally performs well experimental benchmarks. guess trick used other algorithms as well, general ensuring good bias/variance search algorithms.