作者: Olivier Teytaud , Jean-Baptiste Hoock , Fabien Teytaud , David Lupien St-Pierre , Jialin Liu
DOI:
关键词: Machine learning 、 Computer science 、 Test (assessment) 、 Artificial intelligence 、 Portfolio
摘要: A recent research trend in Artificial Intelligence (AI) is the combination of several programs into one single, stronger, program; this termed portfolio methods. We here investigate application such methods to Game Playing Programs (GPPs). In addition, we consider case which only GPP available - by decomposing single ones through use parameters or even simply random seeds. These are trained a learning phase. propose two different offline approaches. The simplest one, BestArm, straightforward optimization seeds parame- ters; it performs quite well against original GPP, but poorly an opponent repeats games and learns. second namely Nash-portfolio, similarly "one game" test, much more robust who also online portfolio, tests repeatedly progressively switches best using bandit algorithm.