作者: Adam White , Martha White , Sina Ghiassian , Andrew Patterson , Shivam Garg
DOI:
关键词: Range (mathematics) 、 Divergence (statistics) 、 Temporal difference learning 、 Function approximation 、 Algorithm 、 Soundness 、 Complex algorithm 、 Face (geometry) 、 Computer science 、 Work (physics)
摘要: It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues sound Gradient TD alternatives exist-because seems rare typically perform well. However, recent work with large neural network learning systems reveals that instability more than previously thought. Practitioners face a difficult dilemma: choose an easy performant method, or complex algorithm but harder tune all unexplored non-linear function approximation control. In this paper, we introduce new method called Regularized Corrections (TDRC), attempts balance ease of use, soundness, performance. behaves as well TD, when performs well, in cases where diverges. We empirically investigate TDRC across range problems, for both prediction control, linear approximation, show, potentially the first time, gradient methods could be better alternative Q-learning.