Importance Resampling for Off-policy Prediction

作者: Martha White , Matthew Schlegel , Wesley Chung , Daniel Graves , Jian Qian

DOI:

关键词:

摘要: Importance sampling (IS) is a common reweighting strategy for off-policy prediction in reinforcement learning. While it consistent and unbiased, can result high variance updates to the weights value function. In this work, we explore resampling as an alternative reweighting. We propose Resampling (IR) prediction, which resamples experience from replay buffer applies standard on-policy updates. The approach avoids using importance ratios update, instead correcting distribution before update. characterize bias consistency of IR, particularly compared Weighted IS (WIS). demonstrate several microworlds that IR has improved sample efficiency lower updates, variance-reduced strategies, including variants WIS V-trace clips ratios. also provide demonstration showing improves over learning function images racing car simulator.

参考文章(0)