作者: Sebastian Thrun , Anton Schwartz
DOI:
关键词:
摘要: Reinforcement learning techniques address the problem of to select actions in unknown, dynamic environments. It is widely acknowledged that be use complex domains, reinforcement must combined with generalizing function approximation methods such as artificial neural networks. Little, however, understood about theoretical properties combinations, and many researchers have encountered failures practice. In this paper we identify a prime source failures—namely, systematic overestimation utility values. Using Watkins’ Q-Learning [18] an example, give account phenomenon, deriving conditions under which one may expected it cause fail. Employing some most popular approximators, present experimental results support findings.