Unsupervised Learning and Exploration of Reachable Outcome Space

作者: Stephane Doncieux , Alban Laflaquière , Alexandre Coninx , Giuseppe Paolo

DOI:

关键词: AutoencoderSurpriseMachine learningComputer scienceArtificial intelligenceNoveltySet (psychology)PopulationSpace (commercial competition)Reinforcement learningUnsupervised learningOutcome (probability)

摘要: Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there no signal to properly guide the learning process. In such situations, good search strategy fundamental. At same time, not having adapt algorithm every single desirable. Here we introduce TAXONS, Task Agnostic eXploration of Outcome spaces through Novelty and Surprise algorithm. Based on population-based divergent-search approach, it learns set diverse policies directly from high-dimensional observations, without any task-specific information. TAXONS builds repertoire while training an autoencoder observation final state system build low-dimensional outcome space. The learned space, combined reconstruction error, used drive for new policies. Results show that can find controllers, covering part ground-truth information about

参考文章(38)
Stefan Schaal, Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics Adaptive Motion of Animals and Machines. pp. 261- 280 ,(2006) , 10.1007/4-431-31381-8_23
Georgios N. Yannakakis, Julian Togelius, Héctor Perez Martínez, Antonios Liapis, Transforming Exploratory Creativity with DeLeNoX 4th International Conference on Computational Creativity, ICCC 2013. pp. 56- 63 ,(2013)
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, None, High-Dimensional Continuous Control Using Generalized Advantage Estimation arXiv: Learning. ,(2015)
Nick Jakobi, Phil Husbands, Inman Harvey, Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics european conference on artificial life. pp. 704- 720 ,(1995) , 10.1007/3-540-59496-5_337
Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret, Robots that can adapt like animals Nature. ,vol. 521, pp. 503- 507 ,(2015) , 10.1038/NATURE14422
H. B. Mann, D. R. Whitney, On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other Annals of Mathematical Statistics. ,vol. 18, pp. 50- 60 ,(1947) , 10.1214/AOMS/1177730491
Jens Kober, J. Andrew Bagnell, Jan Peters, Reinforcement learning in robotics: A survey The International Journal of Robotics Research. ,vol. 32, pp. 1238- 1274 ,(2013) , 10.1177/0278364913495721
Anh Mai Nguyen, Jason Yosinski, Jeff Clune, Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning genetic and evolutionary computation conference. pp. 959- 966 ,(2015) , 10.1145/2739480.2754703
Stephane Doncieux, Jean-Baptiste Mouret, Behavioral diversity with multiple behavioral distances congress on evolutionary computation. pp. 1427- 1434 ,(2013) , 10.1109/CEC.2013.6557731
A. Cully, J.-B. Mouret, Evolving a behavioral repertoire for a walking robot Evolutionary Computation. ,vol. 24, pp. 59- 88 ,(2016) , 10.1162/EVCO_A_00143