作者: Stephane Doncieux , Alban Laflaquière , Alexandre Coninx , Giuseppe Paolo
DOI:
关键词: Autoencoder 、 Surprise 、 Machine learning 、 Computer science 、 Artificial intelligence 、 Novelty 、 Set (psychology) 、 Population 、 Space (commercial competition) 、 Reinforcement learning 、 Unsupervised learning 、 Outcome (probability)
摘要: Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there no signal to properly guide the learning process. In such situations, good search strategy fundamental. At same time, not having adapt algorithm every single desirable. Here we introduce TAXONS, Task Agnostic eXploration of Outcome spaces through Novelty and Surprise algorithm. Based on population-based divergent-search approach, it learns set diverse policies directly from high-dimensional observations, without any task-specific information. TAXONS builds repertoire while training an autoencoder observation final state system build low-dimensional outcome space. The learned space, combined reconstruction error, used drive for new policies. Results show that can find controllers, covering part ground-truth information about