How to Dynamically Merge Markov Decision Processes

作者: Satinder P. Singh , David Cohn

DOI:

关键词: Dynamic programmingComputer scienceMathematical optimizationMarkov decision processPartially observable Markov decision processMerge (version control)

摘要: We are frequently called upon to perform multiple tasks that compete for our attention and resource. Often we know the optimal solution each task in isolation; this paper, describe how knowledge can be exploited efficiently find good solutions doing parallel. formulate problem as of dynamically merging Markov decision processes (MDPs) into a composite MDP, present new theoretically-sound dynamic programming algorithm finding an policy MDP. analyze various aspects illustrate its use on simple problem.

参考文章(7)
John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)
Leslie Pack Kaelbling, Nils J. Nilsson, Learning in Embedded Systems ,(1993)
Andrew G. Barto, Steven J. Bradtke, Satinder P. Singh, Learning to act using real-time dynamic programming Artificial Intelligence. ,vol. 72, pp. 81- 138 ,(1995) , 10.1016/0004-3702(94)00011-O
Andrew W. Moore, Christopher G. Atkeson, Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time Machine Learning. ,vol. 13, pp. 103- 130 ,(1993) , 10.1023/A:1022635613229
Dimitri P. Bertsekas, Dynamic Programming and Optimal Control Athena Scientific. ,(1995)
Thomas G. Dietterich, Wei Zhang, High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network neural information processing systems. ,vol. 8, pp. 1024- 1030 ,(1995)
C. J. C. H. Watkins, Learning from delayed rewards Ph. D thesis, Cambridge University Psychology Department. ,(1989)