作者: Scott Niekum , Peter Stone , Ishan Durugkar , Mauricio Tec
DOI:
关键词: State (computer science) 、 Hindsight bias 、 Reinforcement learning 、 Dual (category theory) 、 Robotics 、 Markov decision process 、 Probability mass function 、 Function (engineering) 、 Artificial intelligence 、 Computer science
摘要: Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we …