作者: Christopher G. Atkeson , Andrew W. Moore
DOI:
关键词: Incremental learning 、 Dynamic programming 、 Computer science 、 Computation 、 Stochastic control 、 Reinforcement learning 、 Mathematical optimization 、 Control (management)
摘要: We present a new algorithm, Prioritized Sweeping, for efficient prediction and control of stochastic Markov systems. Incremental learning methods such as Temporal Differencing Q-learning have fast real time performance. Classical are slower, but more accurate, because they make full use the observations. Sweeping aims best both worlds. It uses all previous experiences to prioritize important dynamic programming sweeps guide exploration state-space. compare with other reinforcement schemes number different optimal problems. successfully solves large state-space problems which difficulty.