作者: Jivko Sinapov , Matteo Leonetti , Peter Stone , Sanmit Narvekar
关键词:
摘要: In a reinforcement learning setting, the goal of transfer is to improve performance on target task by re-using knowledge from one or more source tasks. A key problem in how choose appropriate tasks for given task. Current approaches typically require that agent has some experience domain, specified model (e.g., Markov Decision Process) with known parameters. To address these limitations, this paper proposes framework selecting absence samples. Instead, our approach uses meta-data attribute-value pairs) associated each learn expected benefit source-target pair. test method, we conducted large-scale experiment Ms. Pac-Man domain which an played over 170 million games spanning 192 variations The used vast amounts about (or detriment) transferring another. Subsequently, successfully selected previously unseen