作者: Scott Niekum , Peter Stone , Elad Liebman , Piyush Khandelwal
DOI:
关键词:
摘要: Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective large probabilistic planning domains. In this paper, we focus on how values are back-propagated MCTS tree, apply complex return strategies from Reinforcement Learning (RL) literature MCTS, producing 4 new variants. We demonstrate that some benchmarks International Planning Competition (IPC), selecting a variant with backup strategy different averaging can lead substantially better results. also propose hypothesis for why performance particular environments, manipulate carefully structured grid-world domain provide empirical evidence supporting our hypothesis.