Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

作者: Wolfram Burgard , Tobias Gindele , Michael Herman , Felix Schmitt , Jörg Wagner

DOI:

关键词: Optimization problemMathematical optimizationFunction (mathematics)Task (project management)Computer scienceInverse problemSample (statistics)Markov decision processTransfer of learning

摘要: Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function a Markov Decision Process (MDP) from observed behavior agent. Since agent's originates in its policy and MDP policies depend on both stochastic system dynamics as well function, solution inverse is significantly influenced by both. Current IRL approaches assume that if transition model unknown, additional samples system's are accessible, or provides enough to solve accurately. These assumptions often not satisfied. To overcome this, we present gradient-based approach simultaneously estimates dynamics. By solving combined optimization problem, our takes into account bias demonstrations, which stems generating policy. The evaluation synthetic transfer task shows improvements regarding sample efficiency accuracy estimated functions models.

参考文章(0)