Probabilistic Reuse of Past Policies

作者： Fernando Fernández , Manuela M Veloso

DOI:

关键词: Policy analysis 、 Computer science 、 Probabilistic logic 、 Reuse 、 Set (psychology) 、 Machine learning 、 Metric (unit) 、 Data science 、 Process (engineering) 、 Artificial intelligence 、 Markov process 、 Ranking

摘要: Abstract: "A past policy provides a bias to guide the exploration of the environment and speed up the learning of a new action policy. The success of this bias depends on whether the past policy is 'similar' to the actual policy or not. In this report, we describe a new algorithm, PRQ-Learning, that reuses a set of past policies to bias the learning of a new one. The past policies are ranked following a similarity metric that estimates how useful is [sic] to reuse each of those past policies. This ranking provides a probabilistic bias for the exploration in the new learning process. Several experiments demonstrate that PRQ-Learning finds a balance between exploitation of the ongoing learned policy, exploration of random actions, and exploration toward the past policies."

参考文章(13)

Fernando Fernández, Manuela M Veloso, Exploration and Policy Reuse ,(2005)

Daniel Borrajo, Fernando Fernández, On determinism handling while learning reduced state space representations european conference on artificial intelligence. pp. 380- 384 ,(2002)

James L. Carroll, Todd Peterson, Fixed vs. Dynamic Sub-Transfer in Reinforcement Learning. international conference on machine learning and applications. pp. 3- 8 ,(2002)

Manuela Veloso, William Taubman Bryant Uther, Tree based hierarchical reinforcement learning Carnegie Mellon University. ,(2002)

Doina Precup, Satinder P. Singh, Richard S. Sutton, Intra-Option Learning about Temporally Abstract Actions international conference on machine learning. pp. 556- 564 ,(1998)

Sebastian B. Thrun, Efficient Exploration In Reinforcement Learning Carnegie Mellon University. ,(1992)

James Bruce, Manuela M. Veloso, Real-time randomized path planning for robot navigation intelligent robots and systems. ,vol. 3, pp. 288- 295 ,(2002) , 10.1007/978-3-540-45135-8_23

Manuela M. Veloso, Jaime G. Carbonell, Derivational Analogy in PRODIGY: Automating Case Acquisition, Storage, and Utilization Machine Learning. ,vol. 10, pp. 249- 278 ,(1993) , 10.1023/A:1022686910523

Sebastian Thrun, Anton Schwartz, Finding Structure in Reinforcement Learning neural information processing systems. ,vol. 7, pp. 385- 392 ,(1994)

10.

Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming ,(1994)

Probabilistic Reuse of Past Policies

来源期刊

我的账户

Probabilistic Reuse of Past Policies

来源期刊

相似文章 0

我的账户