作者: Richard S Sutton
DOI:
关键词: Parameterized complexity 、 Artificial neural network 、 Dynamic programming 、 Artificial intelligence 、 Reinforcement learning 、 Computer science 、 Function approximation 、 Coding (social sciences) 、 Monte Carlo method
摘要: On large problems, reinforcement learning systems must use parameterized function approximators such as neural networks in order to generalize between similar situations and actions. In these cases there are no strong theoretical results on the accuracy of convergence, computational have been mixed. particular, Boyan Moore reported at last year's meeting a series negative attempting apply dynamic programming together with approximation simple control problems continuous state spaces. this paper, we present positive for all tasks they attempted, one that is significantly larger. The most important differences used sparse-coarse-coded (CMACs) whereas mostly global approximators, learned online offline. others suggested encountered could be solved by using actual outcomes ("rollouts"), classical Monte Carlo methods, TD(λ) algorithm when λ = 1. However, our experiments always resulted substantially poorer performance. We conclude can work robustly conjunction little justification avoiding case general λ.