作者: Thore Graepel , John Shawe-Taylor , Diana Borsa
DOI:
关键词:
摘要: We investigate a paradigm in multi-task reinforcement learning (MT-RL) which an agent is placed environment and needs to learn perform series of tasks, within this space. Since the does not change, there potentially lot common ground amongst tasks solve them individually seems extremely wasteful. In paper, we explicitly model shared structure as it arises state-action value will show how one can jointly optimal value-functions by modifying popular Value-Iteration Policy-Iteration procedures accommodate representation assumption leverage power supervised learning. Finally, demonstrate that proposed training procedures, are able infer good functions, even under low samples regimes. addition data efficiency, our analysis, abstractions state space across leads more robust, transferable representations with potential for better generalization.