Truncated approximate dynamic programming with task-dependent terminal value

作者: Daniel N. Nikovski , Amir-massoud Farahmand , Hiroki Konaka , Yuji Igarashi

DOI:

关键词: Time horizonPropagation of uncertaintyBellman equationReinforcement learningMarkov decision processComputer scienceMathematical optimizationTerminal valueDynamic programmingFunction (mathematics)

摘要: We propose a new class of computationally fast algorithms to find close optimal policy for Markov Decision Processes (MDP) with large finite horizon T. The main idea is that instead planning until the time T, we plan only up truncated H ≪ T and use an estimate true value function as terminal value. Our approach finding learn mapping from MDP its by solving many similar MDPs during training phase fit regression estimator. analyze method providing error propagation theorem shows effect various sources errors quality solution. also empirically validate this in real-world application designing energy management system Hybrid Electric Vehicles promising results.

参考文章(30)
Tom Schaul, Daniel Horgan, David Silver, Karol Gregor, Universal Value Function Approximators international conference on machine learning. pp. 1312- 1320 ,(2015)
Sham Kakade, John Langford, Approximately Optimal Approximate Reinforcement Learning international conference on machine learning. pp. 267- 274 ,(2002)
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
Lucian Busoniu, Robert Babuska, Bart De Schutter, Damien Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators CRC Press. ,(2010) , 10.1201/9781439821091
Rémi Munos, Error bounds for approximate policy iteration international conference on machine learning. pp. 560- 567 ,(2003)
Ilya Sutskever, Chris J. Maddison, Aja Huang, David Silver, Move Evaluation in Go Using Deep Convolutional Neural Networks international conference on learning representations. ,(2015)
Marc Peter Deisenroth, Peter Englert, Jan Peters, Dieter Fox, Multi-Task Policy Search for Robotics international conference on robotics and automation. pp. 3876- 3881 ,(2014) , 10.1109/ICRA.2014.6907421
Sylvain Gelly, David Silver, Monte-Carlo tree search and rapid action value estimation in computer Go Artificial Intelligence. ,vol. 175, pp. 1856- 1875 ,(2011) , 10.1016/J.ARTINT.2011.03.007