Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization

作者： Gabriel Dulac-Arnold , Ludovic Denoyer , Philippe Preux , Patrick Gallinari

关键词: Error correcting 、 Markov decision process 、 Artificial intelligence 、 Coding (social sciences) 、 Learning complexity 、 Mathematics 、 Factorization 、 Classifier (UML) 、 Discrete mathematics 、 Reinforcement learning 、 Rendering (computer graphics)

摘要: The use of Reinforcement Learning in real-world scenarios is strongly limited by issues scale. Most RL learning algorithms are unable to deal with problems composed hundreds or sometimes even dozens possible actions, and therefore cannot be applied many problems. We consider the problem supervised classification framework where optimal policy obtained through a multiclass classifier, set classes being actions problem. introduce error-correcting output codes (ECOCs) this setting propose two new methods for reducing complexity when using rollouts-based approaches. first method consists an ECOC-based classifier as from $\mathcal{O}(A^2)$ $\mathcal{O}(A \log(A))$. then novel that profits ECOC's coding dictionary split initial MDP into $\mathcal{O}(\log(A))$ separate two-action MDPs. This second reduces further, $\mathcal{O}(\log(A))$, thus rendering large action sets tractable. finish experimentally demonstrating advantages our approach on benchmark problems, both speed performance.

参考文章(3)

T. G. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes Journal of Artificial Intelligence Research. ,vol. 2, pp. 263- 286 ,(1994) , 10.1613/JAIR.105

Michail G. Lagoudakis, Ronald Parr, Reinforcement learning as classification: leveraging modern classifiers international conference on machine learning. pp. 424- 431 ,(2003)

Alina Beygelzimer, John Langford, Bianca Zadrozny, Machine Learning Techniques—Reductions Between Prediction Quality Metrics Performance Modeling and Engineering. pp. 3- 28 ,(2008) , 10.1007/978-0-387-79361-0_1

Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization

来源期刊

我的账户

Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization

来源期刊

相似文章 2

Reinforcement Learning in Large Discrete Action Spaces.

Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control.

我的账户