作者: Wolfram Burgard , Tobias Gindele , Michael Herman , Felix Schmitt , Jörg Wagner
DOI:
关键词: Optimization problem 、 Mathematical optimization 、 Function (mathematics) 、 Task (project management) 、 Computer science 、 Inverse problem 、 Sample (statistics) 、 Markov decision process 、 Transfer of learning
摘要: Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function a Markov Decision Process (MDP) from observed behavior agent. Since agent's originates in its policy and MDP policies depend on both stochastic system dynamics as well function, solution inverse is significantly influenced by both. Current IRL approaches assume that if transition model unknown, additional samples system's are accessible, or provides enough to solve accurately. These assumptions often not satisfied. To overcome this, we present gradient-based approach simultaneously estimates dynamics. By solving combined optimization problem, our takes into account bias demonstrations, which stems generating policy. The evaluation synthetic transfer task shows improvements regarding sample efficiency accuracy estimated functions models.