Reinforcement learning in the presence of rare events

作者: Jordan Frank , Shie Mannor , Doina Precup

DOI: 10.1145/1390156.1390199

关键词: Learning classifier systemMachine learningSemi-supervised learningQ-learningReinforcement learningBellman equationRare eventsGeneralization errorUnsupervised learningArtificial intelligenceMarkov decision processTemporal difference learningComputer science

摘要: We consider the task of reinforcement learning in an environment which rare significant events occur independently actions selected by controlling agent. If these are sampled according to their natural probability occurring, convergence conventional algorithms is likely be slow, and may exhibit high variance. In this work, we assume that have access a simulator, event probabilities can artificially altered. Then, importance sampling used learn with simulation data. introduce for policy evaluation, using both tabular function approximation representations value function. prove cases, converge. case, also analyze bias variance our approach compared TD-learning. evaluate empirically performance algorithm on random Markov Decision Processes, as well large network planning task.

参考文章(4)
A.G. Barto, R.S. Sutton, Reinforcement Learning: An Introduction ,(1988)
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586