Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization

作者: Gabriel Dulac-Arnold , Ludovic Denoyer , Philippe Preux , Patrick Gallinari

DOI: 10.1007/978-3-642-33486-3_12

关键词: Error correctingMarkov decision processArtificial intelligenceCoding (social sciences)Learning complexityMathematicsFactorizationClassifier (UML)Discrete mathematicsReinforcement learningRendering (computer graphics)

摘要: The use of Reinforcement Learning in real-world scenarios is strongly limited by issues scale. Most RL learning algorithms are unable to deal with problems composed hundreds or sometimes even dozens possible actions, and therefore cannot be applied many problems. We consider the problem supervised classification framework where optimal policy obtained through a multiclass classifier, set classes being actions problem. introduce error-correcting output codes (ECOCs) this setting propose two new methods for reducing complexity when using rollouts-based approaches. first method consists an ECOC-based classifier as from $\mathcal{O}(A^2)$ $\mathcal{O}(A \log(A))$. then novel that profits ECOC's coding dictionary split initial MDP into $\mathcal{O}(\log(A))$ separate two-action MDPs. This second reduces further, $\mathcal{O}(\log(A))$, thus rendering large action sets tractable. finish experimentally demonstrating advantages our approach on benchmark problems, both speed performance.

参考文章(3)
T. G. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes Journal of Artificial Intelligence Research. ,vol. 2, pp. 263- 286 ,(1994) , 10.1613/JAIR.105
Michail G. Lagoudakis, Ronald Parr, Reinforcement learning as classification: leveraging modern classifiers international conference on machine learning. pp. 424- 431 ,(2003)
Alina Beygelzimer, John Langford, Bianca Zadrozny, Machine Learning Techniques—Reductions Between Prediction Quality Metrics Performance Modeling and Engineering. pp. 3- 28 ,(2008) , 10.1007/978-0-387-79361-0_1