Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search

作者： Zahy Bnaya , Ariel Felner , Rami Puzis , Alon Palombo

DOI:

关键词:

摘要: Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS can use different strategies to aggregate these and provide an estimation for states’ values. The most common aggregation method is store mean reward all Another approach stores best observed from each state. Both methods have complementary benefits drawbacks. In this paper, we show that both are biased estimators real expected states. We propose hybrid uses with low noise, otherwise mean. Experimental results Sailing domain our has a considerable advantage when drawn noisy distribution.

uni-trier.de 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(12)

Carmel Domshlak, Zohar Feldman, To UCT, or not to UCT? (Position Paper) annual symposium on combinatorial search. ,(2013)

Malte Helmert, Thomas Keller, Trial-based Heuristic tree search for finite horizon MDPs international conference on automated planning and scheduling. pp. 135- 143 ,(2013)

Levente Kocsis, Csaba Szepesvári, Bandit Based Monte-Carlo Planning Lecture Notes in Computer Science. pp. 282- 293 ,(2006) , 10.1007/11871842_29

Rémi Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search annual conference on computers. pp. 72- 83 ,(2006) , 10.1007/978-3-540-75538-8_7

Carmel Domshlak, Zohar Feldman, Monte-Carlo planning: theoretically fast convergence meets practical efficiency uncertainty in artificial intelligence. pp. 212- 221 ,(2013)

B. P. Welford, Note on a Method for Calculating Corrected Sums of Squares and Products Technometrics. ,vol. 4, pp. 419- 420 ,(1962) , 10.1080/00401706.1962.10490022

Sylvain Gelly, David Silver, Combining online and offline knowledge in UCT international conference on machine learning. pp. 273- 280 ,(2007) , 10.1145/1273496.1273531

Richard Bellman, A Markovian Decision Process Indiana University Mathematics Journal. ,vol. 6, pp. 679- 684 ,(1957) , 10.1512/IUMJ.1957.6.56038

Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Pure exploration in finitely-armed and continuous-armed bandits Theoretical Computer Science. ,vol. 412, pp. 1832- 1852 ,(2011) , 10.1016/J.TCS.2010.12.059

10.

Cameron B. Browne, Edward Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, Simon Colton, A Survey of Monte Carlo Tree Search Methods IEEE Transactions on Computational Intelligence and AI in Games. ,vol. 4, pp. 1- 43 ,(2012) , 10.1109/TCIAIG.2012.2186810

Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search

来源期刊

我的账户

Confidence Backup Updates for Aggregating MDP State Values in Monte-Carlo Tree Search

来源期刊

相似文章 0

我的账户