Automatically Reinforcing a Game AI

作者: Olivier Teytaud , Jean-Baptiste Hoock , Fabien Teytaud , David Lupien St-Pierre , Jialin Liu

DOI:

关键词: Machine learningComputer scienceTest (assessment)Artificial intelligencePortfolio

摘要: A recent research trend in Artificial Intelligence (AI) is the combination of several programs into one single, stronger, program; this termed portfolio methods. We here investigate application such methods to Game Playing Programs (GPPs). In addition, we consider case which only GPP available - by decomposing single ones through use parameters or even simply random seeds. These are trained a learning phase. propose two different offline approaches. The simplest one, BestArm, straightforward optimization seeds parame- ters; it performs quite well against original GPP, but poorly an opponent repeats games and learns. second namely Nash-portfolio, similarly "one game" test, much more robust who also online portfolio, tests repeatedly progressively switches best using bandit algorithm.

参考文章(19)
V. N. Vapnik, The Nature of Statistical Learning Theory. ,(1995)
Paul E. Utgoff, Perceptron trees: a case study in hybrid concept representations national conference on artificial intelligence. pp. 601- 606 ,(1988)
David W. Aha, Generalizing from case studies: a case study international conference on machine learning. pp. 1- 10 ,(1992) , 10.1016/B978-1-55860-247-2.50006-1
Eugene Nudelman, Kevin Leyton-Brown, Holger H. Hoos, Alex Devkar, Yoav Shoham, Understanding random SAT: beyond the clauses-to-variables ratio principles and practice of constraint programming. pp. 438- 452 ,(2004) , 10.1007/978-3-540-30201-8_33
Serdar Kadioglu, Yuri Malitsky, Ashish Sabharwal, Horst Samulowitz, Meinolf Sellmann, Algorithm selection and scheduling principles and practice of constraint programming. pp. 454- 469 ,(2011) , 10.1007/978-3-642-23786-7_35
Holger H. Hoos, Kevin Leyton-Brown, Ashiqur R. KhudaBukhsh, Lin Xu, SATenstein: automatically building local search SAT solvers from components international joint conference on artificial intelligence. pp. 517- 524 ,(2009) , 10.14288/1.0051500
Matteo Gagliolo, Jürgen Schmidhuber, Learning dynamic algorithm portfolios Annals of Mathematics and Artificial Intelligence. ,vol. 47, pp. 295- 328 ,(2006) , 10.1007/S10472-006-9036-Z
Michael D. Grigoriadis, Leonid G. Khachiyan, A sublinear-time randomized approximation algorithm for matrix games Operations Research Letters. ,vol. 18, pp. 53- 58 ,(1995) , 10.1016/0167-6377(95)00032-0
Maciej Swiechowski, Jacek Mandziuk, Self-Adaptation of Playing Strategies in General Game Playing IEEE Transactions on Computational Intelligence and AI in Games. ,vol. 6, pp. 367- 381 ,(2014) , 10.1109/TCIAIG.2013.2275163