作者: Daniel Borrajo , Fernando Fernández
DOI: 10.1002/INT.V23:2
关键词:
摘要: When applying reinforcement learning in domains with very large or continuous state spaces, the experience obtained by agent interaction environment must be generalized. The generalization methods are usually based on approximation of value functions used to compute action policy and tackled two different ways. On one hand using an a supervized method. other hand, discretizing use tabular representation functions. In this work, we propose algorithm that uses both approaches benefits mechanisms, allowing higher performance. approach is phases. first one, learner as function approximator, but machine technique which also outputs space discretization environment, such nearest prototype classifiers decision trees do. second phase, computed phase obtain previous tuning approximation. Experiments show executing phases improves results only one. take into account resources performance learned behavior. © 2008 Wiley Periodicals, Inc.