作者: Steven D. Whitehead
DOI:
关键词: Context (language use) 、 Reinforcement learning 、 Intelligent agent 、 Artificial intelligence 、 Optimal decision 、 Time complexity 、 State space 、 Decision problem 、 Learning classifier system 、 Computer science
摘要: Reinforcement learning algorithms, when used to solve multi-stage decision problems, perform a kind of online (incremental) search find an optimal policy. The time complexity this strongly depends upon the size and structure state space priori knowledge encoded in learners initial parameter values. When is not available, unbiased can be excessive. Cooperative mechanisms help reduce by providing learner with shorter latency feedback auxiliary sources experience. These are based on observation that nature, intelligent agents exist cooperative social environment helps guide learning. Within context, involves information transfer as much it does discovery trial-and-error. Two described: Learning External Critic (or LEC) By Watching LBW). these along Q-learning, analyzed for problem solving tasks restricted class spaces. results indicate while expected require moderately exponential space, LEC LBW algorithms at most linear under appropriate conditions, independent altogether; requiring proportional length solution path. While analytic apply only tasks, they shed light reinforcement general utility reducing search.