Probabilistic Reuse of Past Policies

作者: Fernando Fernández , Manuela M Veloso

DOI:

关键词: Policy analysisComputer scienceProbabilistic logicReuseSet (psychology)Machine learningMetric (unit)Data scienceProcess (engineering)Artificial intelligenceMarkov processRanking

摘要: Abstract: "A past policy provides a bias to guide the exploration of the environment and speed up the learning of a new action policy. The success of this bias depends on whether the past policy is 'similar' to the actual policy or not. In this report, we describe a new algorithm, PRQ-Learning, that reuses a set of past policies to bias the learning of a new one. The past policies are ranked following a similarity metric that estimates how useful is [sic] to reuse each of those past policies. This ranking provides a probabilistic bias for the exploration in the new learning process. Several experiments demonstrate that PRQ-Learning finds a balance between exploitation of the ongoing learned policy, exploration of random actions, and exploration toward the past policies."

参考文章(13)
Fernando Fernández, Manuela M Veloso, Exploration and Policy Reuse ,(2005)
Daniel Borrajo, Fernando Fernández, On determinism handling while learning reduced state space representations european conference on artificial intelligence. pp. 380- 384 ,(2002)
James L. Carroll, Todd Peterson, Fixed vs. Dynamic Sub-Transfer in Reinforcement Learning. international conference on machine learning and applications. pp. 3- 8 ,(2002)
Manuela Veloso, William Taubman Bryant Uther, Tree based hierarchical reinforcement learning Carnegie Mellon University. ,(2002)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Intra-Option Learning about Temporally Abstract Actions international conference on machine learning. pp. 556- 564 ,(1998)
Sebastian B. Thrun, Efficient Exploration In Reinforcement Learning Carnegie Mellon University. ,(1992)
James Bruce, Manuela M. Veloso, Real-time randomized path planning for robot navigation intelligent robots and systems. ,vol. 3, pp. 288- 295 ,(2002) , 10.1007/978-3-540-45135-8_23
Manuela M. Veloso, Jaime G. Carbonell, Derivational Analogy in PRODIGY: Automating Case Acquisition, Storage, and Utilization Machine Learning. ,vol. 10, pp. 249- 278 ,(1993) , 10.1023/A:1022686910523
Sebastian Thrun, Anton Schwartz, Finding Structure in Reinforcement Learning neural information processing systems. ,vol. 7, pp. 385- 392 ,(1994)