Q-Learning with Hidden-Unit Restarting

作者: Charles W. Anderson

DOI:

关键词:

摘要: Platt's resource-allocation network (RAN) (Platt, 1991a, 1991b) is modified for a reinforcement-learning paradigm and to "restart" existing hidden units rather than adding new units. After restarting, continue learn via back-propagation. The resulting restart algorithm tested in Q-learning that learns solve an inverted pendulum problem. Solutions are found faster on average with the without it.

参考文章(10)
Charles W. Anderson, Strategy Learning with Multilayer Connectionist Representations Proceedings of the Fourth International Workshop on MACHINE LEARNING#R##N#June 22–25, 1987 University of California, Irvine. pp. 103- 114 ,(1987) , 10.1016/B978-0-934613-41-5.50014-3
A. Klopf, Earl Gose, An Evolutionary Pattern Recognition Network IEEE Transactions on Systems Science and Cybernetics. ,vol. 5, pp. 247- 250 ,(1969) , 10.1109/TSSC.1969.300268
Darrell Whitley, Stephen Dominic, Rajarshi Das, Charles W. Anderson, Genetic Reinforcement Learning for Neurocontrol Problems Machine Learning. ,vol. 13, pp. 259- 284 ,(1993) , 10.1007/BF00993045
Andrew G. Barto, Richard S. Sutton, Charles W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems systems man and cybernetics. ,vol. 13, pp. 834- 846 ,(1983) , 10.1109/TSMC.1983.6313077
Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479
John C. Platt, Leaning by Combining Memorization and Gradient Descent neural information processing systems. ,vol. 3, pp. 714- 720 ,(1990)
John Platt, A resource-allocating network for function interpolation Neural Computation. ,vol. 3, pp. 213- 225 ,(1991) , 10.1162/NECO.1991.3.2.213
Michael Jordan, Robert Jacobs, None, Learning to Control an Unstable System with Forward Modeling neural information processing systems. ,vol. 2, pp. 324- 331 ,(1989)
Paul Smolensky, Michael C. Mozer, Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment neural information processing systems. ,vol. 1, pp. 107- 115 ,(1988)