Addressing Function Approximation Error in Actor-Critic Methods

作者： David Meger , Herke van Hoof , Scott Fujimoto

DOI:

关键词:

摘要: In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting propose novel mechanisms minimize its effects on both the actor critic. Our algorithm builds Double by taking minimum between a pair of critics limit overestimation. draw connection target networks overestimation bias, suggest delaying policy updates reduce per-update error further improve performance. evaluate our method suite OpenAI gym tasks, outperforming state art every environment tested.

参考文章(49)

Sebastian Thrun, Anton Schwartz, Issues in Using Function Approximation for Reinforcement Learning ,(1999)

Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)

Doina Precup, Richard S. Sutton, Sanjoy Dasgupta, Off-Policy Temporal Difference Learning with Function Approximation international conference on machine learning. pp. 417- 424 ,(2001)

John Schulman, None, Trust Region Policy Optimization international conference on machine learning. pp. 1889- 1897 ,(2015)

Vijay R. Konda, John N. Tsitsiklis, On Actor-Critic Algorithms Siam Journal on Control and Optimization. ,vol. 42, pp. 1143- 1166 ,(2003) , 10.1137/S0363012901385691

G. E. Uhlenbeck, L. S. Ornstein, On the Theory of the Brownian Motion Physical Review. ,vol. 36, pp. 823- 841 ,(1930) , 10.1103/PHYSREV.36.823

David K. Smith, Dynamic Programming and Optimal Control. Volume 1 Journal of the Operational Research Society. ,vol. 47, pp. 833- 834 ,(1996) , 10.1057/JORS.1996.103

Donghun Lee, Boris Defourny, Warren B. Powell, Bias-corrected Q-learning to control max-operator bias in Q-learning ieee symposium on adaptive dynamic programming and reinforcement learning. pp. 93- 99 ,(2013) , 10.1109/ADPRL.2013.6614994

Dimitri P. Bertsekas, Dynamic Programming and Optimal Control Athena Scientific. ,(1995)

10.

Richard S. Sutton, Learning to Predict by the Methods of Temporal Differences Machine Learning. ,vol. 3, pp. 9- 44 ,(1988) , 10.1023/A:1022633531479

Addressing Function Approximation Error in Actor-Critic Methods

来源期刊

我的账户

Addressing Function Approximation Error in Actor-Critic Methods

来源期刊

相似文章 10

我的账户