Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search

作者: Zahy Bnaya , Ariel Felner , Rami Puzis , Alon Palombo

DOI:

关键词:

摘要: Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS can use different strategies to aggregate these and provide an estimation for states’ values. The most common aggregation method is store mean reward all Another approach stores best observed from each state. Both methods have complementary benefits drawbacks. In this paper, we show that both are biased estimators real expected states. We propose hybrid uses with low noise, otherwise mean. Experimental results Sailing domain our has a considerable advantage when drawn noisy distribution.

参考文章(12)
Carmel Domshlak, Zohar Feldman, To UCT, or not to UCT? (Position Paper) annual symposium on combinatorial search. ,(2013)
Malte Helmert, Thomas Keller, Trial-based Heuristic tree search for finite horizon MDPs international conference on automated planning and scheduling. pp. 135- 143 ,(2013)
Levente Kocsis, Csaba Szepesvári, Bandit Based Monte-Carlo Planning Lecture Notes in Computer Science. pp. 282- 293 ,(2006) , 10.1007/11871842_29
Rémi Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search annual conference on computers. pp. 72- 83 ,(2006) , 10.1007/978-3-540-75538-8_7
Carmel Domshlak, Zohar Feldman, Monte-Carlo planning: theoretically fast convergence meets practical efficiency uncertainty in artificial intelligence. pp. 212- 221 ,(2013)
Sylvain Gelly, David Silver, Combining online and offline knowledge in UCT international conference on machine learning. pp. 273- 280 ,(2007) , 10.1145/1273496.1273531
Richard Bellman, A Markovian Decision Process Indiana University Mathematics Journal. ,vol. 6, pp. 679- 684 ,(1957) , 10.1512/IUMJ.1957.6.56038
Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Pure exploration in finitely-armed and continuous-armed bandits Theoretical Computer Science. ,vol. 412, pp. 1832- 1852 ,(2011) , 10.1016/J.TCS.2010.12.059
Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, Simon Colton, A Survey of Monte Carlo Tree Search Methods IEEE Transactions on Computational Intelligence and AI in Games. ,vol. 4, pp. 1- 43 ,(2012) , 10.1109/TCIAIG.2012.2186810