作者: Daniel N. Nikovski , Amir-massoud Farahmand , Hiroki Konaka , Yuji Igarashi
DOI:
关键词: Time horizon 、 Propagation of uncertainty 、 Bellman equation 、 Reinforcement learning 、 Markov decision process 、 Computer science 、 Mathematical optimization 、 Terminal value 、 Dynamic programming 、 Function (mathematics)
摘要: We propose a new class of computationally fast algorithms to find close optimal policy for Markov Decision Processes (MDP) with large finite horizon T. The main idea is that instead planning until the time T, we plan only up truncated H ≪ T and use an estimate true value function as terminal value. Our approach finding learn mapping from MDP its by solving many similar MDPs during training phase fit regression estimator. analyze method providing error propagation theorem shows effect various sources errors quality solution. also empirically validate this in real-world application designing energy management system Hybrid Electric Vehicles promising results.