Understanding the impact of entropy on policy optimization

作者: Dale Schuurmans , Nicolas Le Roux , Mohammad Norouzi , Zafarali Ahmed

DOI:

关键词:

摘要: Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It believed help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze claim using new visualizations landscape based on randomly perturbing loss function. We first show that even access exact gradient, difficult due geometry objective Then, qualitatively some environments, a higher entropy can make smoother, thereby connecting local optima and enabling use larger learning rates. This paper presents tools for understanding landscape, shows serves as regularizer, highlights challenge designing general-purpose algorithms.

参考文章(40)
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, None, High-Dimensional Continuous Control Using Generalized Advantage Estimation arXiv: Learning. ,(2015)
John Schulman, None, Trust Region Policy Optimization international conference on machine learning. pp. 1889- 1897 ,(2015)
Yann Le Cun, Ido Kanter, Sara A. Solla, Eigenvalues of covariance matrices: Application to neural-network learning Physical Review Letters. ,vol. 66, pp. 2396- 2399 ,(1991) , 10.1103/PHYSREVLETT.66.2396
RONALD J. WILLIAMS, JING PENG, Function Optimization using Connectionist Reinforcement Learning Algorithms Connection Science. ,vol. 3, pp. 241- 268 ,(1991) , 10.1080/09540099108946587
Olivier Chapelle, Mingrui Wu, Gradient descent optimization of smoothed information retrieval metrics Information Retrieval. ,vol. 13, pp. 216- 235 ,(2010) , 10.1007/S10791-009-9110-3
Peter L. Bartlett, Evan Greensmith, Jonathan Baxter, Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning Journal of Machine Learning Research. ,vol. 5, pp. 1471- 1530 ,(2004) , 10.5555/1005332.1044710
Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama, None, Analysis and Improvement of Policy Gradient Estimation neural information processing systems. ,vol. 24, pp. 262- 270 ,(2011)
Sham M Kakade, A Natural Policy Gradient neural information processing systems. ,vol. 14, pp. 1531- 1538 ,(2001)
Yishay Mansour, Satinder P. Singh, Richard S Sutton, David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation neural information processing systems. ,vol. 12, pp. 1057- 1063 ,(1999)