作者: Pieter Abbeel , Chelsea Finn , Sergey Levine , Marvin Zhang , Zoe McCarthy
DOI:
关键词:
摘要: Policy learning for partially observed control tasks requires policies that can remember salient information from past observations. In this paper, we present a method with internal memory high-dimensional, continuous systems, such as robotic manipulators. Our approach consists of augmenting the state and action space system continuous-valued states policy read write to. Learning general-purpose type representation directly is difficult, because must automatically figure out most to memorize at each time step. We show that, by decomposing search problem into trajectory optimization phase supervised through called guided search, acquire effective memorization recall strategies. Intuitively, chooses values will make it easier produce right in future states, while encourages use actions those states. evaluate our on involving manipulation navigation settings, learn complex successfully complete range require memory.