作者: Jordan Frank , Shie Mannor , Doina Precup
关键词: Learning classifier system 、 Machine learning 、 Semi-supervised learning 、 Q-learning 、 Reinforcement learning 、 Bellman equation 、 Rare events 、 Generalization error 、 Unsupervised learning 、 Artificial intelligence 、 Markov decision process 、 Temporal difference learning 、 Computer science
摘要: We consider the task of reinforcement learning in an environment which rare significant events occur independently actions selected by controlling agent. If these are sampled according to their natural probability occurring, convergence conventional algorithms is likely be slow, and may exhibit high variance. In this work, we assume that have access a simulator, event probabilities can artificially altered. Then, importance sampling used learn with simulation data. introduce for policy evaluation, using both tabular function approximation representations value function. prove cases, converge. case, also analyze bias variance our approach compared TD-learning. evaluate empirically performance algorithm on random Markov Decision Processes, as well large network planning task.