Sequential Decision Problems and Neural Networks

作者: A. G. Barto , R. S. Sutton , C. J. C. H. Watkins

DOI:

关键词:

摘要: Decision making tasks that involve delayed consequences are very common yet difficult to address with supervised learning methods. If there is an accurate model of the underlying dynamical system, then these can be formulated as sequential decision problems and solved by Dynamic Programming. This paper discusses reinforcement in terms framework shows how a algorithm similar one implemented Adaptive Critic Element used pole-balancer Barto, Sutton, Anderson (1983), further developed Sutton (1984), fits into this framework. neural networks play significant roles modules for approximating functions required solving problems.

参考文章(14)
Richard Stuart Sutton, Temporal credit assignment in reinforcement learning University of Massachusetts Amherst. ,(1984)
Michael R. Hilliard, Gunar E. Liepins, Gita Rangarajan, Mark Palmer, Alternatives for classifier system credit assignment international joint conference on artificial intelligence. pp. 756- 761 ,(1989)
Charles W. Anderson, Learning and Problem Solving with Multilayer Connectionist Systems University Microfilms International. ,(1986)
D. P. Bertsekas, Chelsea C. White, Dynamic Programming and Stochastic Control IEEE Transactions on Systems, Man, and Cybernetics. ,vol. 7, pp. 758- 759 ,(1977) , 10.1109/TSMC.1977.4309612
Ian H. Witten, An adaptive optimal controller for discrete-time Markov environments Information and Control. ,vol. 34, pp. 286- 295 ,(1977) , 10.1016/S0019-9958(77)90354-0
Andrew G. Barto, Richard S. Sutton, Charles W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems systems man and cybernetics. ,vol. 13, pp. 834- 846 ,(1983) , 10.1109/TSMC.1983.6313077
Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479