作者: Zahy Bnaya , Ariel Felner , Rami Puzis , Alon Palombo
DOI:
关键词:
摘要: Monte-Carlo Tree Search (MCTS) algorithms estimate the value of MDP states based on rewards received by performing multiple random simulations. MCTS can use different strategies to aggregate these and provide an estimation for states’ values. The most common aggregation method is store mean reward all Another approach stores best observed from each state. Both methods have complementary benefits drawbacks. In this paper, we show that both are biased estimators real expected states. We propose hybrid uses with low noise, otherwise mean. Experimental results Sailing domain our has a considerable advantage when drawn noisy distribution.