Issues in Using Function Approximation for Reinforcement Learning

作者: Sebastian Thrun , Anton Schwartz

DOI:

关键词:

摘要: Reinforcement learning techniques address the problem of to select actions in unknown, dynamic environments. It is widely acknowledged that be use complex domains, reinforcement must combined with generalizing function approximation methods such as artificial neural networks. Little, however, understood about theoretical properties combinations, and many researchers have encountered failures practice. In this paper we identify a prime source failures—namely, systematic overestimation utility values. Using Watkins’ Q-Learning [18] an example, give account phenomenon, deriving conditions under which one may expected it cause fail. Employing some most popular approximators, present experimental results support findings.

参考文章(14)
Vijaykumar Gullapalli, Reinforcement learning and its application to control University of Massachusetts. ,(1992)
Richard Stuart Sutton, Temporal credit assignment in reinforcement learning University of Massachusetts Amherst. ,(1984)
Andrew William Moore, None, Efficient memory-based learning for robot control ,(1990)
Leslie Pack Kaelbling, David Chapman, Input generalization in delayed reinforcement learning: an algorithm and performance comparisons international joint conference on artificial intelligence. pp. 726- 731 ,(1991)
Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh, Learning to act using real-time dynamic programming Artificial Intelligence. ,vol. 72, pp. 81- 138 ,(1995) , 10.1016/0004-3702(94)00011-O
Satinder P. Singh, Richard C. Yee, An Upper Bound on the Loss from Approximate Optimal-Value Functions Machine Learning. ,vol. 16, pp. 227- 233 ,(1994) , 10.1023/A:1022693225949
Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479
Gerald Tesauro, Practical Issues in Temporal Difference Learning Machine Learning. ,vol. 8, pp. 257- 277 ,(1992) , 10.1007/BF00992697
Steven J. Bradtke, Reinforcement Learning Applied to Linear Quadratic Regulation neural information processing systems. ,vol. 5, pp. 295- 302 ,(1992)