作者: Ryan Hayward , Kenny Young , Gautham Vasan
DOI:
关键词: Artificial neural network 、 Reinforcement learning 、 Initialization 、 Artificial intelligence 、 Action (philosophy) 、 State space 、 Champion 、 Computer science 、 Q-learning 、 Olympiad
摘要: DeepMind's recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents --- e.g. for Atari games via Q-learning the game of Go Reinforcement Learning raises many questions, including what extent these methods will succeed other domains. In this paper we consider DQL Hex: after supervised initialization, use selfplay train NeuroHex, an 11-layer CNN that plays Hex on 13x13 board. is classic two-player alternate-turn stone placement played a rhombus hexagonal cells which winner whomever connects their two opposing sides. Despite large action state space, our system trains Q-network capable strong play with no search. After weeks Q-learning, NeuroHex achieves win-rates 20.4% as first player 2.1% second against 1-second/move version MoHex, current ICGA Olympiad champion. Our data suggests further improvement might be possible more training time.