Neurohex: A Deep Q-learning Hex Agent

作者： Ryan Hayward , Kenny Young , Gautham Vasan

DOI:

关键词: Artificial neural network 、 Reinforcement learning 、 Initialization 、 Artificial intelligence 、 Action (philosophy) 、 State space 、 Champion 、 Computer science 、 Q-learning 、 Olympiad

摘要: DeepMind's recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents --- e.g. for Atari games via Q-learning the game of Go Reinforcement Learning raises many questions, including what extent these methods will succeed other domains. In this paper we consider DQL Hex: after supervised initialization, use selfplay train NeuroHex, an 11-layer CNN that plays Hex on 13x13 board. is classic two-player alternate-turn stone placement played a rhombus hexagonal cells which winner whomever connects their two opposing sides. Despite large action state space, our system trains Q-network capable strong play with no search. After weeks Q-learning, NeuroHex achieves win-rates 20.4% as first player 2.1% second against 1-second/move version MoHex, current ICGA Olympiad champion. Our data suggests further improvement might be possible more training time.

arxiv.org 本地加速

harvard.edu 本地加速

arxiv.org PDF 下载加速

参考文章(7)

Claude Shannon, Computers and Automata Proceedings of the IRE. ,vol. 41, pp. 1234- 1241 ,(1953) , 10.1109/JRPROC.1953.274273

Vadim V. Anshelevich, The Game of Hex: An Automatic Theorem Proving Approach to Game Programming national conference on artificial intelligence. pp. 189- 194 ,(2000)

Broderick Arneson, Ryan B. Hayward, Philip Henderson, Monte Carlo Tree Search in Hex IEEE Transactions on Computational Intelligence and AI in Games. ,vol. 2, pp. 251- 258 ,(2010) , 10.1109/TCIAIG.2010.2067212

Broderick Arneson, Ryan Hayward, Philip Henderson, Wolve Wins Hex Tournament ,(2008)

A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)

Ryan Hayward, MOHEX wins Hex Tournament ICGA Journal. ,vol. 36, pp. 180- 183 ,(2009) , 10.3233/ICG-2012-35212

S. Reisch, Hex ist PSPACE-vollstadig Acta Infomatica. ,vol. 15, pp. 167- 191 ,(1981)

Neurohex: A Deep Q-learning Hex Agent

来源期刊

我的账户

Neurohex: A Deep Q-learning Hex Agent

来源期刊

相似文章 6

Thinking Fast and Slow with Deep Learning and Tree Search

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

Computer Hex Algorithm Using a Move Evaluation Method Based on a Convolutional Neural Network

Two-stage training of a spoken dialogue system

Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

HexCNN: A Framework for Native Hexagonal Convolutional Neural Networks

我的账户