Understanding sampling-based adversarial search methods

作者: Bart Selman , Raghuram Ramanujan , Ashish Sabharwal

DOI:

关键词: Adversarial systemAmateurContrast (statistics)HeuristicSampling (statistics)Noise (video)Computer scienceMinimaxArtificial intelligenceMonte Carlo tree search

摘要: Until 2007, the best computer programs for playing board game Go performed at level of a weak amateur, while employing same Minimax algorithm that had proven so successful in other games such as Chess and Checkers. Thanks to revolutionary new sampling-based planning approach named Upper Confidence bounds applied Trees (UCT), today's play master on full-sized 19 × boards. Intriguingly, UCT's spectacular success has not been replicated domains have traditional stronghold Minimax-style approaches. The focus this thesis is understanding phenomenon. We begin with thorough examination various facets UCT Mancala, where we can contrast behavior better understood approach. then introduce notion shallow search traps — positions short winning strategies opposing player exist demonstrate these are distributed very differently different games, significant impact performance UCT. Finally, study two novel synthetic settings permit mathematical analysis. show relatively robust misleading heuristic feedback if noise samples independently drawn, whereas systematic biases cause prematurely “freeze” onto sub-optimal lines thus perform poorly. conclude discussion potential avenues future work.

参考文章(75)
Jonathan Schaeffer, Experiments in search and knowledge ICGA Journal. ,vol. 9, pp. 156- 156 ,(1986) , 10.3233/ICG-1986-9309
Yngvi Björnsson, Tony Marsland, From minimax to Manhattan national conference on artificial intelligence. pp. 31- 36 ,(1997)
Gian Piero Favini, Paolo Ciancarini, Monte Carlo tree search techniques in the game of Kriegspiel international joint conference on artificial intelligence. pp. 474- 479 ,(2009)
Alan Fern, Radha-Krishna Balla, UCT for tactical assault planning in real-time strategy games international joint conference on artificial intelligence. pp. 40- 45 ,(2009)
John Willard Milnor, Games Against Nature RAND Corporation. ,(1951)
Darse Billings, Algorithms and assessment in computer poker University of Alberta. ,(2006)
David McAllester, Denize Yuret, Alpha-Beta-Conspiracy Search. ICGA Journal. ,vol. 25, pp. 16- 35 ,(2002) , 10.3233/ICG-2002-25103
Richard J. Lorentz, Amazons Discover Monte-Carlo Computers and Games. pp. 13- 24 ,(2008) , 10.1007/978-3-540-87608-3_2
Rémi Coulom, COMPUTING “ELO RATINGS” OF MOVE PATTERNS IN THE GAME OF GO computer games. ,vol. 30, pp. 198- 208 ,(2007) , 10.3233/ICG-2007-30403
Michael Kearns, Yishay Mansour, Andrew Y. Ng, A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes Machine Learning. ,vol. 49, pp. 193- 208 ,(2002) , 10.1023/A:1017932429737