作者: Bart Selman , Raghuram Ramanujan , Ashish Sabharwal
DOI:
关键词: Adversarial system 、 Amateur 、 Contrast (statistics) 、 Heuristic 、 Sampling (statistics) 、 Noise (video) 、 Computer science 、 Minimax 、 Artificial intelligence 、 Monte Carlo tree search
摘要: Until 2007, the best computer programs for playing board game Go performed at level of a weak amateur, while employing same Minimax algorithm that had proven so successful in other games such as Chess and Checkers. Thanks to revolutionary new sampling-based planning approach named Upper Confidence bounds applied Trees (UCT), today's play master on full-sized 19 × boards. Intriguingly, UCT's spectacular success has not been replicated domains have traditional stronghold Minimax-style approaches. The focus this thesis is understanding phenomenon. We begin with thorough examination various facets UCT Mancala, where we can contrast behavior better understood approach. then introduce notion shallow search traps — positions short winning strategies opposing player exist demonstrate these are distributed very differently different games, significant impact performance UCT. Finally, study two novel synthetic settings permit mathematical analysis. show relatively robust misleading heuristic feedback if noise samples independently drawn, whereas systematic biases cause prematurely “freeze” onto sub-optimal lines thus perform poorly. conclude discussion potential avenues future work.