作者: Michail G. Lagoudakis , Ronald Parr
DOI:
关键词:
摘要: The basic tools of machine learning appear in the inner loop most reinforcement algorithms, typically form Monte Carlo methods or function approximation techniques. To a large extent, however, current algorithms draw upon techniques that are at least ten years old and, with few exceptions, very little has been done to exploit recent advances classification for purposes learning. We use variant approximate policy iteration based on rollouts allows us pure learner, such as support vector (SVM), algorithm. argue SVMs, particularly combination kernel trick, can make it easier apply an "out-of-the-box" technique, without extensive feature engineering. Our approach opens door modern methods, but does not preclude classical methods. present experimental results pendulum balancing and bicycle riding domains using both SVMs neural networks classifiers.