Real time targeted exploration in large domains

DOI: 10.1109/DEVLRN.2010.5578845

关键词: Reinforcement learning 、 Prediction algorithms 、 Intrinsic motivation 、 Artificial intelligence 、 Domain (software engineering) 、 Machine learning 、 Computer science 、 Computation 、 Action (philosophy) 、 Bayesian probability 、 Decision tree

摘要: A developing agent needs to explore learn about the world and good behaviors. In many real tasks, this exploration can take far too long, must make decisions which states explore, not explore. Bayesian methods attempt address problem, but much computation time run in reasonably sized domains. paper, we present TEXPLORE, first algorithm perform targeted large The learns multiple possible models of domain that generalize action effects across states. We experiment with ways adding intrinsic motivation drive exploration. TEXPLORE is fully implemented tested a novel called Fuel World designed reflect type needed world. show our significantly outperforms representative examples both model-free model-based RL algorithms from literature able quickly well real-time.