作者: A. G. Barto , R. S. Sutton , C. J. C. H. Watkins
DOI:
关键词:
摘要: Decision making tasks that involve delayed consequences are very common yet difficult to address with supervised learning methods. If there is an accurate model of the underlying dynamical system, then these can be formulated as sequential decision problems and solved by Dynamic Programming. This paper discusses reinforcement in terms framework shows how a algorithm similar one implemented Adaptive Critic Element used pole-balancer Barto, Sutton, Anderson (1983), further developed Sutton (1984), fits into this framework. neural networks play significant roles modules for approximating functions required solving problems.