Automatically Reinforcing a Game AI

作者： Olivier Teytaud , Jean-Baptiste Hoock , Fabien Teytaud , David Lupien St-Pierre , Jialin Liu

DOI:

关键词: Machine learning 、 Computer science 、 Test (assessment) 、 Artificial intelligence 、 Portfolio

摘要: A recent research trend in Artificial Intelligence (AI) is the combination of several programs into one single, stronger, program; this termed portfolio methods. We here investigate application such methods to Game Playing Programs (GPPs). In addition, we consider case which only GPP available - by decomposing single ones through use parameters or even simply random seeds. These are trained a learning phase. propose two different offline approaches. The simplest one, BestArm, straightforward optimization seeds parame- ters; it performs quite well against original GPP, but poorly an opponent repeats games and learns. second namely Nash-portfolio, similarly "one game" test, much more robust who also online portfolio, tests repeatedly progressively switches best using bandit algorithm.

参考文章(19)

V. N. Vapnik, The Nature of Statistical Learning Theory. ,(1995)

Paul E. Utgoff, Perceptron trees: a case study in hybrid concept representations national conference on artificial intelligence. pp. 601- 606 ,(1988)

Lars Kotthoff, Algorithm Selection for Combinatorial Search Problems: A survey Ai Magazine. ,vol. 35, pp. 48- 60 ,(2014) , 10.1007/978-3-319-50137-6_7

David W. Aha, Generalizing from case studies: a case study international conference on machine learning. pp. 1- 10 ,(1992) , 10.1016/B978-1-55860-247-2.50006-1

Eugene Nudelman, Kevin Leyton-Brown, Holger H. Hoos, Alex Devkar, Yoav Shoham, Understanding random SAT: beyond the clauses-to-variables ratio principles and practice of constraint programming. pp. 438- 452 ,(2004) , 10.1007/978-3-540-30201-8_33

Serdar Kadioglu, Yuri Malitsky, Ashish Sabharwal, Horst Samulowitz, Meinolf Sellmann, Algorithm selection and scheduling principles and practice of constraint programming. pp. 454- 469 ,(2011) , 10.1007/978-3-642-23786-7_35

Holger H. Hoos, Kevin Leyton-Brown, Ashiqur R. KhudaBukhsh, Lin Xu, SATenstein: automatically building local search SAT solvers from components international joint conference on artificial intelligence. pp. 517- 524 ,(2009) , 10.14288/1.0051500

Matteo Gagliolo, Jürgen Schmidhuber, Learning dynamic algorithm portfolios Annals of Mathematics and Artificial Intelligence. ,vol. 47, pp. 295- 328 ,(2006) , 10.1007/S10472-006-9036-Z

Michael D. Grigoriadis, Leonid G. Khachiyan, A sublinear-time randomized approximation algorithm for matrix games Operations Research Letters. ,vol. 18, pp. 53- 58 ,(1995) , 10.1016/0167-6377(95)00032-0

10.

Maciej Swiechowski, Jacek Mandziuk, Self-Adaptation of Playing Strategies in General Game Playing IEEE Transactions on Computational Intelligence and AI in Games. ,vol. 6, pp. 367- 381 ,(2014) , 10.1109/TCIAIG.2013.2275163

Automatically Reinforcing a Game AI

来源期刊

我的账户

Automatically Reinforcing a Game AI

来源期刊

相似文章 2

Surprising strategies obtained by stochastic optimization in partially observable games

Surprising Strategies Obtained by Stochastic Optimization in Partially Observable Games

我的账户