Continuous control with deep reinforcement learning

作者： Yuval Tassa , Daan Wierstra , Alexander Pritzel , Tom Erez , Jonathan J. Hunt

DOI:

关键词: Computer science 、 Action (philosophy) 、 Domain (software engineering) 、 Network architecture 、 Reinforcement learning 、 Artificial intelligence 、 Control (management)

摘要: We adapt the ideas underlying success of Deep Q-Learning to continuous action domain. present an actor-critic, model-free algorithm based on deterministic policy gradient that can operate over spaces. Using same learning algorithm, network architecture and hyper-parameters, our robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion car driving. Our is able find policies whose performance competitive with those found by a planning full access dynamics domain its derivatives. further demonstrate for many tasks learn end-to-end: directly from raw pixel inputs.

harvard.edu 本地加速

arxiv.org 本地加速

arxiv.org PDF 下载加速

scirate.com LINK 下载加速

toronto.edu PDF 下载加速

参考文章(26)

Jan Koutník, Jürgen Schmidhuber, Faustino Gomez, Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning simulation of adaptive behavior. pp. 260- 269 ,(2014) , 10.1007/978-3-319-08864-8_25

Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)

Pieter Abbeel, John Schulman, Nicolas Heess, Theophane Weber, Gradient estimation using stochastic computation graphs neural information processing systems. ,vol. 28, pp. 3528- 3536 ,(2015)

Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)

Roland Hafner, Martin Riedmiller, Reinforcement learning in feedback control Machine Learning. ,vol. 84, pp. 137- 169 ,(2011) , 10.1007/S10994-011-5235-X

Thomas B. Schön, Niklas Wahlström, Marc Peter Deisenroth, From Pixels to Torques: Policy Learning with Deep Dynamical Models arXiv: Machine Learning. ,(2015)

Paweł Wawrzyński, Ajay Kumar Tanwani, 2013 Special Issue: Autonomous reinforcement learning with experience replay Neural Networks. ,vol. 41, pp. 156- 167 ,(2013) , 10.1016/J.NEUNET.2012.11.007

Gerhard Neumann, Marc Peter Deisenroth, Jan Peters, A Survey on Policy Search for Robotics ,(2013)

G. E. Uhlenbeck, L. S. Ornstein, On the Theory of the Brownian Motion Physical Review. ,vol. 36, pp. 823- 841 ,(1930) , 10.1103/PHYSREV.36.823

10.

Yuval Tassa, Tom Erez, Emanuel Todorov, Synthesis and stabilization of complex behaviors through online trajectory optimization intelligent robots and systems. pp. 4906- 4913 ,(2012) , 10.1109/IROS.2012.6386025

Continuous control with deep reinforcement learning

来源期刊

我的账户

Continuous control with deep reinforcement learning

来源期刊

相似文章 10

我的账户