Gradient Temporal-Difference Learning with Regularized Corrections

作者: Adam White , Martha White , Sina Ghiassian , Andrew Patterson , Shivam Garg

DOI:

关键词: Range (mathematics)Divergence (statistics)Temporal difference learningFunction approximationAlgorithmSoundnessComplex algorithmFace (geometry)Computer scienceWork (physics)

摘要: It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues sound Gradient TD alternatives exist-because seems rare typically perform well. However, recent work with large neural network learning systems reveals that instability more than previously thought. Practitioners face a difficult dilemma: choose an easy performant method, or complex algorithm but harder tune all unexplored non-linear function approximation control. In this paper, we introduce new method called Regularized Corrections (TDRC), attempts balance ease of use, soundness, performance. behaves as well TD, when performs well, in cases where diverges. We empirically investigate TDRC across range problems, for both prediction control, linear approximation, show, potentially the first time, gradient methods could be better alternative Q-learning.

参考文章(0)