Issues in Using Function Approximation for Reinforcement Learning

作者： Sebastian Thrun , Anton Schwartz

DOI:

关键词:

摘要: Reinforcement learning techniques address the problem of to select actions in unknown, dynamic environments. It is widely acknowledged that be use complex domains, reinforcement must combined with generalizing function approximation methods such as artificial neural networks. Little, however, understood about theoretical properties combinations, and many researchers have encountered failures practice. In this paper we identify a prime source failures—namely, systematic overestimation utility values. Using Watkins’ Q-Learning [18] an example, give account phenomenon, deriving conditions under which one may expected it cause fail. Employing some most popular approximators, present experimental results support findings.

cmu.edu PDF 下载加速

参考文章(14)

Vijaykumar Gullapalli, Reinforcement learning and its application to control University of Massachusetts. ,(1992)

Sven Koenig, Reid G. Simmons, Complexity Analysis of Real-Time Reinforcement Learning Applied to Finding Shortest Paths in Deterministic Domains Carnegie Mellon University. ,(1992)

Richard Stuart Sutton, Temporal credit assignment in reinforcement learning University of Massachusetts Amherst. ,(1984)

Andrew William Moore, None, Efficient memory-based learning for robot control ,(1990)

Leslie Pack Kaelbling, David Chapman, Input generalization in delayed reinforcement learning: an algorithm and performance comparisons international joint conference on artificial intelligence. pp. 726- 731 ,(1991)

Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh, Learning to act using real-time dynamic programming Artificial Intelligence. ,vol. 72, pp. 81- 138 ,(1995) , 10.1016/0004-3702(94)00011-O

Satinder P. Singh, Richard C. Yee, An Upper Bound on the Loss from Approximate Optimal-Value Functions Machine Learning. ,vol. 16, pp. 227- 233 ,(1994) , 10.1023/A:1022693225949

Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479

Gerald Tesauro, Practical Issues in Temporal Difference Learning Machine Learning. ,vol. 8, pp. 257- 277 ,(1992) , 10.1007/BF00992697

10.

Steven J. Bradtke, Reinforcement Learning Applied to Linear Quadratic Regulation neural information processing systems. ,vol. 5, pp. 295- 302 ,(1992)

Issues in Using Function Approximation for Reinforcement Learning

来源期刊

我的账户

Issues in Using Function Approximation for Reinforcement Learning

来源期刊

相似文章 10

我的账户