A Performance-Based Start State Curriculum Framework for Reinforcement Learning

作者: Herke van Hoof , Felix Schmitt , Jan Wöhlke

DOI:

关键词:

摘要: Sparse reward problems present a challenge for reinforcement learning (RL) agents. Previous work has shown that choosing start states according to curriculum can significantly improve the performance. We observe many existing generation algorithms rely on two key components: Performance measure estimation and selection policy. Therefore, we propose unifying framework performance-based state curricula in RL, which allows analyze compare performance influence of components. Furthermore, new policy using spatial gradients is introduced. conduct extensive empirical evaluations investigate model choice estimation. Benchmarking difficult robotic navigation tasks high-dimensional manipulation task, demonstrate state-of-the-art our novel gradient curriculum.

参考文章(30)
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, None, High-Dimensional Continuous Control Using Generalized Advantage Estimation arXiv: Learning. ,(2015)
Sham Kakade, John Langford, Approximately Optimal Approximate Reinforcement Learning international conference on machine learning. pp. 267- 274 ,(2002)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
Adrien Baranes, Pierre-Yves Oudeyer, Active learning of inverse models with intrinsically motivated goal exploration in robots Robotics and Autonomous Systems. ,vol. 61, pp. 49- 73 ,(2013) , 10.1016/J.ROBOT.2012.05.008
Jürgen Schmidhuber, Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) IEEE Transactions on Autonomous Mental Development. ,vol. 2, pp. 230- 247 ,(2010) , 10.1109/TAMD.2010.2056368
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
Daphne Koller, M. P. Kumar, Benjamin Packer, Self-Paced Learning for Latent Variable Models neural information processing systems. ,vol. 23, pp. 1189- 1197 ,(2010)
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, Koh Hosoda, Purposive behavior acquisition for a real robot by vision-based reinforcement learning Machine Learning. ,vol. 23, pp. 279- 303 ,(1996) , 10.1007/BF00117447