Policy evaluation using the Ω-return

作者: Georgios Theocharous , George Konidaris , Scott Niekum , Philip S. Thomas

DOI:

关键词:

摘要: We propose the Ω-return as an alternative to λ-return currently used by TD(λ) family of algorithms. The benefit is that it accounts for correlation different length returns. Because difficult compute exactly, we suggest one way approximating Ω-return. provide empirical studies superior and γ-return a variety problems.

参考文章(1)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)