Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications

作者: Haichuan Yang , Haichuan Yang , Ji Liu , Ji Liu , Han Liu

DOI:

关键词:

摘要: Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, learns policy over $\mathbb{R}^d$ in latter, discrete set of actions each which is parametrized by parameter. Such problems are naturally solved using based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading instability slow convergence. Unnecessary introduced whenever policies bounded action spaces modeled distributions with unbounded support applying transformation $T$ sampled before execution environment. Recently, reduced clipped gradient (CAPG) was for intervals, date no methods exist when direction, something seen RTS games. To this end we introduce angular (APG), stochastic method directional With marginal gradients family estimators present unified analysis reduction properties APG CAPG; our results provide stronger guarantee than existing analyses CAPG. Experimental on popular game navigation task show that estimator offers substantial improvement standard gradient.

参考文章(30)
John Schulman, None, Trust Region Policy Optimization international conference on machine learning. pp. 1889- 1897 ,(2015)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
J. T. Chang, D. Pollard, Conditioning as disintegration Statistica Neerlandica. ,vol. 51, pp. 287- 317 ,(1997) , 10.1111/1467-9574.00056
David Blackwell, Conditional Expectation and Unbiased Sequential Estimation Annals of Mathematical Statistics. ,vol. 18, pp. 105- 110 ,(1947) , 10.1214/AOMS/1177730497
Peter L. Bartlett, Evan Greensmith, Jonathan Baxter, Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning Journal of Machine Learning Research. ,vol. 5, pp. 1471- 1530 ,(2004) , 10.5555/1005332.1044710
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Aviv Tamar, Dotan Di Castro, Ron Meir, Peter Dayan, Integrating a partial model into model free reinforcement learning Journal of Machine Learning Research. ,vol. 13, pp. 1927- 1966 ,(2012) , 10.5555/2188385.2343705
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Yishay Mansour, Satinder P. Singh, Richard S Sutton, David A. McAllester, Policy Gradient Methods for Reinforcement Learning with Function Approximation neural information processing systems. ,vol. 12, pp. 1057- 1063 ,(1999)
Daan Wierstra, Martin Riedmiller, Guy Lever, Nicolas Heess, Thomas Degris, David Silver, Deterministic Policy Gradient Algorithms international conference on machine learning. pp. 387- 395 ,(2014)