作者: Dale Schuurmans , Nicolas Le Roux , Mohammad Norouzi , Zafarali Ahmed
DOI:
关键词:
摘要: Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It believed help with \emph{exploration} by encouraging the selection of more stochastic policies. In this work, we analyze claim using new visualizations landscape based on randomly perturbing loss function. We first show that even access exact gradient, difficult due geometry objective Then, qualitatively some environments, a higher entropy can make smoother, thereby connecting local optima and enabling use larger learning rates. This paper presents tools for understanding landscape, shows serves as regularizer, highlights challenge designing general-purpose algorithms.