作者: Pieter Abbeel , Sergey Levine , Justin Fu
DOI:
关键词:
摘要: One of the key challenges in applying reinforcement learning to complex robotic control tasks is need gather large amounts experience order find an effective policy for task at hand. Model-based can achieve good sample efficiency, but requires ability learn a model dynamics that enough policy. In this work, we develop model-based algorithm combines prior knowledge from previous with online adaptation model. These two ingredients enable highly sample-efficient even regimes where estimating true very difficult, since allows method locally compensate unmodeled variation dynamics. We encode into neural network model, adapt it by progressively refitting local linear dynamics, and use predictive plan under these Our experimental results show approach be used solve variety manipulation just single attempt, using data other behaviors.