作者: Eiji Uchibe
DOI: 10.1007/978-3-319-46687-3_3
关键词:
摘要: This study proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. It is based on our previous method that exploits the fact log of ratio between an optimal state transition and a baseline one given by part difference value functions under linearly solvable Markov decision processes are estimated logistic regression. However, assumed be linear whose basis prepared in advance. To overcome this limitation, we employ neural network frameworks implement Simulation results show comparable model-based methods with less computing effort Objectworld benchmark. In addition, policy, which trained shaping using functions, outperforms policies used collect data game Reversi.