Deep Inverse Reinforcement Learning by Logistic Regression

作者： Eiji Uchibe

关键词:

摘要: This study proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. It is based on our previous method that exploits the fact log of ratio between an optimal state transition and a baseline one given by part difference value functions under linearly solvable Markov decision processes are estimated logistic regression. However, assumed be linear whose basis prepared in advance. To overcome this limitation, we employ neural network frameworks implement Simulation results show comparable model-based methods with less computing effort Objectworld benchmark. In addition, policy, which trained shaping using functions, outperforms policies used collect data game Reversi.

springer.com LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(3)

Andrew Y. Ng, Stuart J. Russell, Daishi Harada, Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping international conference on machine learning. pp. 278- 287 ,(1999)

Ingmar Posner, Markus Wulfmeier, Peter Ondruska, Maximum Entropy Deep Inverse Reinforcement Learning arXiv: Learning. ,(2015)

Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323

Deep Inverse Reinforcement Learning by Logistic Regression

来源期刊

我的账户

Deep Inverse Reinforcement Learning by Logistic Regression

来源期刊

相似文章 0

我的账户