Deep Inverse Reinforcement Learning by Logistic Regression

作者: Eiji Uchibe

DOI: 10.1007/978-3-319-46687-3_3

关键词:

摘要: This study proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. It is based on our previous method that exploits the fact log of ratio between an optimal state transition and a baseline one given by part difference value functions under linearly solvable Markov decision processes are estimated logistic regression. However, assumed be linear whose basis prepared in advance. To overcome this limitation, we employ neural network frameworks implement Simulation results show comparable model-based methods with less computing effort Objectworld benchmark. In addition, policy, which trained shaping using functions, outperforms policies used collect data game Reversi.

参考文章(3)
Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)
Ingmar Posner, Markus Wulfmeier, Peter Ondruska, Maximum Entropy Deep Inverse Reinforcement Learning arXiv: Learning. ,(2015)
Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323