Continuous control with deep reinforcement learning

作者: Yuval Tassa , Daan Wierstra , Alexander Pritzel , Tom Erez , Jonathan J. Hunt

DOI:

关键词: Computer scienceAction (philosophy)Domain (software engineering)Network architectureReinforcement learningArtificial intelligenceControl (management)

摘要: We adapt the ideas underlying success of Deep Q-Learning to continuous action domain. present an actor-critic, model-free algorithm based on deterministic policy gradient that can operate over spaces. Using same learning algorithm, network architecture and hyper-parameters, our robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion car driving. Our is able find policies whose performance competitive with those found by a planning full access dynamics domain its derivatives. further demonstrate for many tasks learn end-to-end: directly from raw pixel inputs.

参考文章(26)
Jan Koutník, Jürgen Schmidhuber, Faustino Gomez, Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning simulation of adaptive behavior. pp. 260- 269 ,(2014) , 10.1007/978-3-319-08864-8_25
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Pieter Abbeel, John Schulman, Nicolas Heess, Theophane Weber, Gradient estimation using stochastic computation graphs neural information processing systems. ,vol. 28, pp. 3528- 3536 ,(2015)
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Roland Hafner, Martin Riedmiller, Reinforcement learning in feedback control Machine Learning. ,vol. 84, pp. 137- 169 ,(2011) , 10.1007/S10994-011-5235-X
Thomas B. Schön, Niklas Wahlström, Marc Peter Deisenroth, From Pixels to Torques: Policy Learning with Deep Dynamical Models arXiv: Machine Learning. ,(2015)
Paweł Wawrzyński, Ajay Kumar Tanwani, 2013 Special Issue: Autonomous reinforcement learning with experience replay Neural Networks. ,vol. 41, pp. 156- 167 ,(2013) , 10.1016/J.NEUNET.2012.11.007
Gerhard Neumann, Marc Peter Deisenroth, Jan Peters, A Survey on Policy Search for Robotics ,(2013)
G. E. Uhlenbeck, L. S. Ornstein, On the Theory of the Brownian Motion Physical Review. ,vol. 36, pp. 823- 841 ,(1930) , 10.1103/PHYSREV.36.823
Yuval Tassa, Tom Erez, Emanuel Todorov, Synthesis and stabilization of complex behaviors through online trajectory optimization intelligent robots and systems. pp. 4906- 4913 ,(2012) , 10.1109/IROS.2012.6386025