Nonparametric Model-Based Reinforcement Learning

作者: Christopher G. Atkeson

DOI:

关键词: Plan (drawing)Reinforcement learningComputer scienceTrajectoryArtificial intelligenceNonparametric modelError-driven learningMachine learningLearning classifier system

摘要: This paper describes some of the interactions model learning algorithms and planning we have found in exploring model-based reinforcement learning. The focuses on how local trajectory optimizers can be used effectively with learned non-parametric models. We find that planners are fully consistent often difficulty finding reasonable plans early stages Trajectory balance obeying minimizing cost (or maximizing reward) do better, even if plan is not model.

参考文章(17)
H. H. Rosenbrock, D. H. Jacobson, D. Q. Mayne, Differential Dynamic Programming The Mathematical Gazette. ,vol. 56, pp. 78- ,(1972) , 10.2307/3613752
Stefan Schaal, Christopher G. Atkeson, Robot Learning From Demonstration international conference on machine learning. pp. 12- 20 ,(1997)
Richard S. Sutton, Planning by Incremental Dynamic Programming Machine Learning Proceedings 1991. pp. 353- 357 ,(1991) , 10.1016/B978-1-55860-200-7.50073-8
Christopher G. Atkeson, Andrew W. Moore, Stefan Schaal, Locally weighted learning for control Artificial Intelligence Review. ,vol. 11, pp. 75- 113 ,(1997) , 10.1023/A:1006511328852
Christopher G. Atkeson, Andrew W. Moore, Stefan Schaal, Locally Weighted Learning Artificial Intelligence Review. ,vol. 11, pp. 11- 73 ,(1997) , 10.1023/A:1006559212014
Richard S. Sutton, Dyna, an integrated architecture for learning, planning, and reacting Intelligence\/sigart Bulletin. ,vol. 2, pp. 160- 163 ,(1991) , 10.1145/122344.122377
Michael F. Cohen, Interactive spacetime control for animation international conference on computer graphics and interactive techniques. ,vol. 26, pp. 293- 302 ,(1992) , 10.1145/133994.134083
Reinforcement learning is direct adaptive optimal control IEEE Control Systems Magazine. ,vol. 12, pp. 19- 22 ,(1992) , 10.1109/37.126844
Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh, Learning to act using real-time dynamic programming Artificial Intelligence. ,vol. 72, pp. 81- 138 ,(1995) , 10.1016/0004-3702(94)00011-O