Learning classifier system with average reward reinforcement learning

作者: Zhaoxiang Zang , Dehua Li , Junying Wang , Dan Xia

DOI: 10.1016/J.KNOSYS.2012.11.011

关键词:

摘要: In the family of Learning Classifier Systems, classifier system XCS is most widely used and investigated. However, standard has difficulties solving large multi-step problems, where long action chains are needed to get delayed rewards. Up present, reinforcement learning technique in been based on Q-learning, which optimizes discounted total reward received by an agent but tends limit length chains. there some undiscounted methods available, such as R-learning average general, optimize per time step. this paper, employed XCS, replace Q-learning. The modification results a that rapid able solve maze problems. addition, it produces uniformly spaced payoff levels, can support thus effectively prevent occurrence overgeneralization.

参考文章(43)
Christopher J. C. H. Watkins, Peter Dayan, Technical Note : \cal Q -Learning Machine Learning. ,vol. 8, pp. 279- 292 ,(1992) , 10.1007/BF00992698
Mani Abedini, Michael Kirley, Guided Rule Discovery in XCS for High-Dimensional Classification Problems AI 2011: Advances in Artificial Intelligence. ,vol. 7106, pp. 1- 10 ,(2011) , 10.1007/978-3-642-25832-9_1
Sridhar Mahadevan, Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning. international conference on machine learning. pp. 328- 336 ,(1996)
Pier Luca Lanzi, Marco Colombetti, An extension to the XCS classifier system for stochastic environments genetic and evolutionary computation conference. pp. 353- 360 ,(1999)
Pier Luca Lanzi, A Study of the Generalization Capabilities of XCS. international conference on genetic algorithms. pp. 418- 425 ,(1997)
David E. Goldberg, Stewart W. Wilson, A critical review of classifier systems international conference on genetic algorithms. pp. 244- 255 ,(1989)
Pier Luca Lanzi, Martin V. Butz, Tim Kovacs, Stewart W. Wilson, How XCS evolves accurate classifiers genetic and evolutionary computation conference. pp. 927- 934 ,(2001)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
B. WIDROW, M. E. HOFF, Adaptive switching circuits Neurocomputing: foundations of research. pp. 123- 134 ,(1988) , 10.21236/AD0241531
Anton Schwartz, A reinforcement learning method for maximizing undiscounted rewards international conference on machine learning. pp. 298- 305 ,(1993) , 10.1016/B978-1-55860-307-3.50045-9