Neurohex: A Deep Q-learning Hex Agent

作者: Ryan Hayward , Kenny Young , Gautham Vasan

DOI:

关键词: Artificial neural networkReinforcement learningInitializationArtificial intelligenceAction (philosophy)State spaceChampionComputer scienceQ-learningOlympiad

摘要: DeepMind's recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents --- e.g. for Atari games via Q-learning the game of Go Reinforcement Learning raises many questions, including what extent these methods will succeed other domains. In this paper we consider DQL Hex: after supervised initialization, use selfplay train NeuroHex, an 11-layer CNN that plays Hex on 13x13 board. is classic two-player alternate-turn stone placement played a rhombus hexagonal cells which winner whomever connects their two opposing sides. Despite large action state space, our system trains Q-network capable strong play with no search. After weeks Q-learning, NeuroHex achieves win-rates 20.4% as first player 2.1% second against 1-second/move version MoHex, current ICGA Olympiad champion. Our data suggests further improvement might be possible more training time.

参考文章(7)
Claude Shannon, Computers and Automata Proceedings of the IRE. ,vol. 41, pp. 1234- 1241 ,(1953) , 10.1109/JRPROC.1953.274273
Vadim V. Anshelevich, The Game of Hex: An Automatic Theorem Proving Approach to Game Programming national conference on artificial intelligence. pp. 189- 194 ,(2000)
Broderick Arneson, Ryan B. Hayward, Philip Henderson, Monte Carlo Tree Search in Hex IEEE Transactions on Computational Intelligence and AI in Games. ,vol. 2, pp. 251- 258 ,(2010) , 10.1109/TCIAIG.2010.2067212
Broderick Arneson, Ryan Hayward, Philip Henderson, Wolve Wins Hex Tournament ,(2008)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Ryan Hayward, MOHEX wins Hex Tournament ICGA Journal. ,vol. 36, pp. 180- 183 ,(2009) , 10.3233/ICG-2012-35212
S. Reisch, Hex ist PSPACE-vollstadig Acta Infomatica. ,vol. 15, pp. 167- 191 ,(1981)