A complexity analysis of cooperative mechanisms in reinforcement learning

作者: Steven D. Whitehead

DOI:

关键词: Context (language use)Reinforcement learningIntelligent agentArtificial intelligenceOptimal decisionTime complexityState spaceDecision problemLearning classifier systemComputer science

摘要: Reinforcement learning algorithms, when used to solve multi-stage decision problems, perform a kind of online (incremental) search find an optimal policy. The time complexity this strongly depends upon the size and structure state space priori knowledge encoded in learners initial parameter values. When is not available, unbiased can be excessive. Cooperative mechanisms help reduce by providing learner with shorter latency feedback auxiliary sources experience. These are based on observation that nature, intelligent agents exist cooperative social environment helps guide learning. Within context, involves information transfer as much it does discovery trial-and-error. Two described: Learning External Critic (or LEC) By Watching LBW). these along Q-learning, analyzed for problem solving tasks restricted class spaces. results indicate while expected require moderately exponential space, LEC LBW algorithms at most linear under appropriate conditions, independent altogether; requiring proportional length solution path. While analytic apply only tasks, they shed light reinforcement general utility reducing search.

参考文章(9)
Rick L. Riolo, Lookahead planning and latent learning in a classifier system simulation of adaptive behavior. pp. 316- 326 ,(1991)
Dana H. Ballard, Steven D. Whitehead, Reactive behavior, learning, and anticipation ,(1989)
Steven D. Whitehead, Dana H. Ballard, A role for anticipation in reactive systems that learn international conference on machine learning. pp. 354- 357 ,(1989) , 10.1016/B978-1-55860-036-2.50090-4
Andrew G. Barto, Richard S. Sutton, Charles W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems systems man and cybernetics. ,vol. 13, pp. 834- 846 ,(1983) , 10.1109/TSMC.1983.6313077
Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479
J.A. Franklin, Refinement of robot motor skills through reinforcement learning conference on decision and control. pp. 1096- 1101 ,(1988) , 10.1109/CDC.1988.194487
Paul R. Thagard, Richard E. Nisbett, Keith J. Holyoak, John H. Holland, Stephen W. Smoliar, Induction: Processes of Inference, Learning, and Discovery ,(1989)
Richard S. Sutton, First results with Dyna, an integrated architecture for learning, planning and reacting Neural networks for control. pp. 179- 189 ,(1990)