Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding

作者: Richard S Sutton

DOI:

关键词: Parameterized complexityArtificial neural networkDynamic programmingArtificial intelligenceReinforcement learningComputer scienceFunction approximationCoding (social sciences)Monte Carlo method

摘要: On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, computational have been mixed. particular, Boyan Moore reported at last year's meeting a series negative attempting apply dynamic programming together with approximation simple control problems continuous state spaces. this paper, we present positive for all tasks they attempted, one that is significantly larger. The most important differences used sparse-coarse-coded (CMACs) whereas mostly global approximators, learned online offline. others suggested encountered could be solved by using actual outcomes ("rollouts"), classical Monte Carlo methods, TD(λ) algorithm when λ = 1. However, our experiments always resulted substantially poorer performance. We conclude can work robustly conjunction little justification avoiding case general λ.

参考文章(23)
THOMAS DEAN, KEN BASYE, JOHN SHEWCHUK, Reinforcement Learning for Planning and Control Machine Learning Methods for Planning. pp. 67- 92 ,(1993) , 10.1016/B978-1-4832-0774-2.50008-1
Richard S. Sutton, Steven D. Whitehead, Online learning with random representations international conference on machine learning. pp. 314- 321 ,(1993) , 10.1016/B978-1-55860-307-3.50047-2
Mark W Spong, Mathukumalli Vidyasagar, Robot dynamics and control ,(1989)
Geoffrey J. Gordon, Stable Function Approximation in Dynamic Programming Machine Learning Proceedings 1995. pp. 261- 268 ,(1995) , 10.1016/B978-1-55860-377-6.50040-2
G. DeJong, M.W. Spong, Swinging up the Acrobot: an example of intelligent control advances in computing and communications. ,vol. 2, pp. 2158- 2162 ,(1994) , 10.1109/ACC.1994.752458
Richard Stuart Sutton, Temporal credit assignment in reinforcement learning University of Massachusetts Amherst. ,(1984)
Thomas G. Dietterich, Wei Zhang, A reinforcement learning approach to job-shop scheduling international joint conference on artificial intelligence. pp. 1114- 1120 ,(1995)
Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X
C.-S. Lin, H. Kim, CMAC-based adaptive critic self-learning control IEEE Transactions on Neural Networks. ,vol. 2, pp. 530- 533 ,(1991) , 10.1109/72.134290
Andrew G. Barto, Richard S. Sutton, Charles W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems systems man and cybernetics. ,vol. 13, pp. 834- 846 ,(1983) , 10.1109/TSMC.1983.6313077