作者: Haichuan Yang , Haichuan Yang , Ji Liu , Ji Liu , Han Liu
DOI:
关键词:
摘要: Many complex domains, such as robotics control and real-time strategy (RTS) games, require an agent to learn a continuous control. In the former, learns policy over $\mathbb{R}^d$ in latter, discrete set of actions each which is parametrized by parameter. Such problems are naturally solved using based reinforcement learning (RL) methods, but unfortunately these often suffer from high variance leading instability slow convergence. Unnecessary introduced whenever policies bounded action spaces modeled distributions with unbounded support applying transformation $T$ sampled before execution environment. Recently, reduced clipped gradient (CAPG) was for intervals, date no methods exist when direction, something seen RTS games. To this end we introduce angular (APG), stochastic method directional With marginal gradients family estimators present unified analysis reduction properties APG CAPG; our results provide stronger guarantee than existing analyses CAPG. Experimental on popular game navigation task show that estimator offers substantial improvement standard gradient.