作者: Gabriel Dulac-Arnold , Ludovic Denoyer , Philippe Preux , Patrick Gallinari
DOI: 10.1007/978-3-642-33486-3_12
关键词: Error correcting 、 Markov decision process 、 Artificial intelligence 、 Coding (social sciences) 、 Learning complexity 、 Mathematics 、 Factorization 、 Classifier (UML) 、 Discrete mathematics 、 Reinforcement learning 、 Rendering (computer graphics)
摘要: The use of Reinforcement Learning in real-world scenarios is strongly limited by issues scale. Most RL learning algorithms are unable to deal with problems composed hundreds or sometimes even dozens possible actions, and therefore cannot be applied many problems. We consider the problem supervised classification framework where optimal policy obtained through a multiclass classifier, set classes being actions problem. introduce error-correcting output codes (ECOCs) this setting propose two new methods for reducing complexity when using rollouts-based approaches. first method consists an ECOC-based classifier as from $\mathcal{O}(A^2)$ $\mathcal{O}(A \log(A))$. then novel that profits ECOC's coding dictionary split initial MDP into $\mathcal{O}(\log(A))$ separate two-action MDPs. This second reduces further, $\mathcal{O}(\log(A))$, thus rendering large action sets tractable. finish experimentally demonstrating advantages our approach on benchmark problems, both speed performance.