Two steps reinforcement learning

作者: Daniel Borrajo , Fernando Fernández

DOI: 10.1002/INT.V23:2

关键词:

摘要: When applying reinforcement learning in domains with very large or continuous state spaces, the experience obtained by agent interaction environment must be generalized. The generalization methods are usually based on approximation of value functions used to compute action policy and tackled two different ways. On one hand using an a supervized method. other hand, discretizing use tabular representation functions. In this work, we propose algorithm that uses both approaches benefits mechanisms, allowing higher performance. approach is phases. first one, learner as function approximator, but machine technique which also outputs space discretization environment, such nearest prototype classifiers decision trees do. second phase, computed phase obtain previous tuning approximation. Experiments show executing phases improves results only one. take into account resources performance learned behavior. © 2008 Wiley Periodicals, Inc.

参考文章(40)
Fernando Fernandez, Pedro Isasi, Automatic finding of good classifiers following a biologically inspired metaphor Computing and Informatics \/ Computers and Artificial Intelligence. ,vol. 21, pp. 205- 220 ,(2002)
Daniel Borrajo, Fernando Fernández, On determinism handling while learning reduced state space representations european conference on artificial intelligence. pp. 380- 384 ,(2002)
Stuart I. Reynolds, Adaptive Resolution Model-Free Reinforcement Learning: Decision Boundary Partitioning international conference on machine learning. pp. 783- 790 ,(2000)
Manuela Veloso, William Taubman Bryant Uther, Tree based hierarchical reinforcement learning Carnegie Mellon University. ,(2002)
Lynne E. Parker, Distributed Algorithms for Multi-Robot Observation of Multiple Moving Targets Autonomous Robots. ,vol. 12, pp. 231- 255 ,(2002) , 10.1023/A:1015256330750
Fernando Fernández, Daniel Borrajo, VQQL. Applying Vector Quantization to Reinforcement Learning robot soccer world cup. pp. 292- 303 ,(2000) , 10.1007/3-540-45327-X_24
Ian H. Witten, Eibe Frank, Generating Accurate Rule Sets Without Global Optimization international conference on machine learning. pp. 144- 151 ,(1998)
Rémi Munos, Andrew Moore, Variable Resolution Discretization in Optimal Control Machine Learning. ,vol. 49, pp. 291- 323 ,(2002) , 10.1023/A:1017992615625