Reinforcement Learning in Large Discrete Action Spaces.

作者: Gabriel Dulac-Arnold , Peter Sunehag , Ben Coppin , Richard Evans

DOI:

关键词:

摘要: Being able to reason in an environment with a large number of discrete actions is essential bringing reinforcement learning larger class problems. Recommender systems, industrial plants and language models are only some the many real-world tasks involving numbers for which current methods difficult or even often impossible apply. An ability generalize over set as well sub-linear complexity relative size both necessary handle such tasks. Current approaches not provide these, motivates work this paper. Our proposed approach leverages prior information about embed them continuous space upon it can generalize. Additionally, approximate nearest-neighbor allow logarithmic-time lookup actions, time-wise tractable training. This combined allows be applied large-scale problems previously intractable methods. We demonstrate our algorithm’s abilities on series having up one million actions.

参考文章(17)
T. G. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes Journal of Artificial Intelligence Research. ,vol. 2, pp. 263- 286 ,(1994) , 10.1613/JAIR.105
Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux, Patrick Gallinari, Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization Machine Learning and Knowledge Discovery in Databases. ,vol. 7524, pp. 180- 194 ,(2012) , 10.1007/978-3-642-33486-3_12
Roland Hafner, Martin Riedmiller, Reinforcement learning in feedback control Machine Learning. ,vol. 84, pp. 137- 169 ,(2011) , 10.1007/S10994-011-5235-X
Marius Muja, David G. Lowe, Scalable Nearest Neighbor Algorithms for High Dimensional Data IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 36, pp. 2227- 2240 ,(2014) , 10.1109/TPAMI.2014.2321376
Michail G. Lagoudakis, Ronald Parr, Reinforcement learning as classification: leveraging modern classifiers international conference on machine learning. pp. 424- 431 ,(2003)
D.V. Prokhorov, D.C. Wunsch, Adaptive critic designs IEEE Transactions on Neural Networks. ,vol. 8, pp. 997- 1007 ,(1997) , 10.1109/72.623201
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Emanuel Todorov, Tom Erez, Yuval Tassa, MuJoCo: A physics engine for model-based control intelligent robots and systems. pp. 5026- 5033 ,(2012) , 10.1109/IROS.2012.6386109
Hado van Hasselt, Marco A. Wiering, Using continuous action spaces to solve discrete problems international joint conference on neural network. pp. 1144- 1151 ,(2009) , 10.1109/IJCNN.2009.5178745