作者: Shixiang Gu , Sergey Levine , Ethan Holly , Timothy Lillicrap
DOI:
关键词:
摘要: Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes for determining robotic action based on current state. Some of those implementations collect experience data from multiple robots operate simultaneously. Each robot generates instances during iterative performance episodes are each explorations performing task, and guided the parameters episode. The collected is generated used by iteratively updating batch data. Further, prior plurality performed robots, updated can be provided (or retrieved) utilization in