Lyapunov-Constrained Action Sets for Reinforcement Learning

作者: Andrew G. Barto , Theodore J. Perkins

DOI:

关键词:

摘要: Lyapunov analysis is a standard approach to studying the stability of dynamical systems and designing controllers. We propose design actions reinforcement learning (RL) agent be descending on function. For minimum cost-to-target problems, this has theoretical benefit guaranteeing that will reach goal state every trial, regardless RL algorithm it uses. In practice, Lyapunov-descent constraints can significantly shorten trials, improve initial worst-case performance, accelerate learning. Although method constraining may limit extent which an minimize cost, allows one construct robust for problems in domain knowledge available. This includes many important individual as well general classes such control feedback linearizable (e.g., industrial robots) continuous-state path-planning problems. demonstrate two simulated problems: pendulum swing-up robot arm control.

参考文章(22)
Gerald DeJong, Hidden Strengths and Limitations: An Empirical Investigation of Reinforcement Learning international conference on machine learning. pp. 215- 222 ,(2000)
Stuart Russell, Ronald Edward Parr, Hierarchical control and learning for markov decision processes University of California, Berkeley. ,(1998)
Doina Precup, Richard S. Sutton, Temporal abstraction in reinforcement learning University of Massachusetts Amherst. ,(2000)
Paul E. Utgoff, Jeffery Allen Clouse, On integrating apprentice learning and reinforcement learning University of Massachusetts Amherst. ,(1996)
Richard Dearden, Nir Friedman, Stuart Russell, Bayesian Q-learning national conference on artificial intelligence. pp. 761- 768 ,(1998)
Walter J. Grantham, Thomas L. Vincent, Nonlinear and Optimal Control Systems ,(1997)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
R. E. Kalman, J. E. Bertram, Control System Analysis and Design Via the “Second Method” of Lyapunov: II—Discrete-Time Systems Journal of Basic Engineering. ,vol. 82, pp. 394- 400 ,(1960) , 10.1115/1.3662605
Gerald Tesauro, TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation. ,vol. 6, pp. 215- 219 ,(1994) , 10.1162/NECO.1994.6.2.215