Real time targeted exploration in large domains

作者: Todd Hester , Peter Stone

DOI: 10.1109/DEVLRN.2010.5578845

关键词: Reinforcement learningPrediction algorithmsIntrinsic motivationArtificial intelligenceDomain (software engineering)Machine learningComputer scienceComputationAction (philosophy)Bayesian probabilityDecision tree

摘要: A developing agent needs to explore learn about the world and good behaviors. In many real tasks, this exploration can take far too long, must make decisions which states explore, not explore. Bayesian methods attempt address problem, but much computation time run in reasonably sized domains. paper, we present TEXPLORE, first algorithm perform targeted large The learns multiple possible models of domain that generalize action effects across states. We experiment with ways adding intrinsic motivation drive exploration. TEXPLORE is fully implemented tested a novel called Fuel World designed reflect type needed world. show our significantly outperforms representative examples both model-free model-based RL algorithms from literature able quickly well real-time.

参考文章(1)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)