作者: Zhaoxiang Zang , Dehua Li , Junying Wang , Dan Xia
DOI: 10.1016/J.KNOSYS.2012.11.011
关键词:
摘要: In the family of Learning Classifier Systems, classifier system XCS is most widely used and investigated. However, standard has difficulties solving large multi-step problems, where long action chains are needed to get delayed rewards. Up present, reinforcement learning technique in been based on Q-learning, which optimizes discounted total reward received by an agent but tends limit length chains. there some undiscounted methods available, such as R-learning average general, optimize per time step. this paper, employed XCS, replace Q-learning. The modification results a that rapid able solve maze problems. addition, it produces uniformly spaced payoff levels, can support thus effectively prevent occurrence overgeneralization.