作者: Ashique Rupam Mahmood , Richard S. Sutton , Martha White , Huizhen Yu
DOI:
关键词: Function approximation 、 Temporal difference learning 、 Artificial intelligence 、 Flexibility (engineering) 、 Computer science 、 Discounting 、 Linear function 、 Bootstrapping (linguistics)
摘要: Emphatic algorithms are temporal-difference learning that change their effective state distribution by selectively emphasizing and de-emphasizing updates on different time steps. Recent works Sutton, Mahmood White (2015), Yu (2015) show varying the emphasis in a particular way, these become stable convergent under off-policy training with linear function approximation. This paper serves as unified summary of available results from both works. In addition, we demonstrate empirical benefits flexibility emphatic algorithms, including state-dependent discounting, bootstrapping, user-specified allocation approximation resources.