作者: Yuval Tassa , Daan Wierstra , Alexander Pritzel , Tom Erez , Jonathan J. Hunt
DOI:
关键词: Computer science 、 Action (philosophy) 、 Domain (software engineering) 、 Network architecture 、 Reinforcement learning 、 Artificial intelligence 、 Control (management)
摘要: We adapt the ideas underlying success of Deep Q-Learning to continuous action domain. present an actor-critic, model-free algorithm based on deterministic policy gradient that can operate over spaces. Using same learning algorithm, network architecture and hyper-parameters, our robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion car driving. Our is able find policies whose performance competitive with those found by a planning full access dynamics domain its derivatives. further demonstrate for many tasks learn end-to-end: directly from raw pixel inputs.