Learning Deep Neural Network Policies with Continuous Memory States

作者: Pieter Abbeel , Chelsea Finn , Sergey Levine , Marvin Zhang , Zoe McCarthy

DOI:

关键词:

摘要: Policy learning for partially observed control tasks requires policies that can remember salient information from past observations. In this paper, we present a method with internal memory high-dimensional, continuous systems, such as robotic manipulators. Our approach consists of augmenting the state and action space system continuous-valued states policy read write to. Learning general-purpose type representation directly is difficult, because must automatically figure out most to memorize at each time step. We show that, by decomposing search problem into trajectory optimization phase supervised through called guided search, acquire effective memorization recall strategies. Intuitively, chooses values will make it easier produce right in future states, while encourages use actions those states. evaluate our on involving manipulation navigation settings, learn complex successfully complete range require memory.

参考文章(24)
Katharina Mülling, Jan Peters, Yasemin Altün, Relative entropy policy search national conference on artificial intelligence. pp. 1607- 1612 ,(2010)
Stefan Schaal, Jan Peters, Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning the european symposium on artificial neural networks. pp. 295- 300 ,(2007)
J. Andrew Bagnell, Jeff Schneider, Covariant policy search international joint conference on artificial intelligence. pp. 1019- 1024 ,(2003) , 10.1184/R1/6552458.V1
Tomás Lozano-Pérez, Leslie Pack Kaelbling, Emma Brunskill, Nicholas Roy, Continuous-State POMDPs with Hybrid Dynamics international symposium on artificial intelligence and mathematics. ,(2008)
Laurent Charlin, Pascal Poupart, Marc Toussaint, Hierarchical POMDP controller optimization by likelihood maximization uncertainty in artificial intelligence. pp. 562- 570 ,(2008)
Leonid Peshkin, Leslie Pack Kaelbling, Kee-Eung Kim, Nicolas Meuleau, Learning finite-state controllers for partially observable environments uncertainty in artificial intelligence. pp. 427- 436 ,(1999)
Daan Wierstra, Alexander Foerster, Jan Peters, Jürgen Schmidhuber, Solving Deep Memory POMDPs with Recurrent Policy Gradients Lecture Notes in Computer Science. pp. 697- 706 ,(2007) , 10.1007/978-3-540-74690-4_71
Gerhard Neumann, Marc Peter Deisenroth, Jan Peters, A Survey on Policy Search for Robotics ,(2013)
Guy Shani, Joelle Pineau, Robert Kaplow, A survey of point-based POMDP solvers Autonomous Agents and Multi-Agent Systems. ,vol. 27, pp. 1- 51 ,(2013) , 10.1007/S10458-012-9200-2
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735