Targeting specific distributions of trajectories in MDPs

作者: Michael L. Littman , Charles L. Isbell , David L. Roberts , Mark J. Nelson , Michael Mateas

DOI:

关键词: Computer scienceState spaceSpace (commercial competition)Class (computer programming)Markov decision processDistribution (mathematics)TrajectoryMathematical optimizationTree (data structure)

摘要: We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal an agent is changed from finding optimal trajectory through state space to realizing specified distribution trajectories space. After motivating this formulation, we show how convert MDP into TTD-MDP. derive algorithm for non-deterministic policies by constructing tree that allows us compute locally-consistent policies. specify necessary conditions solving problem exactly and present heuristic when exact answer impossible or impractical. empirical results our in two domains: synthetic grid world stories interactive drama game.

参考文章(2)
Michael L. Littman, Markov games as a framework for multi-agent reinforcement learning Machine Learning Proceedings 1994. pp. 157- 163 ,(1994) , 10.1016/B978-1-55860-335-6.50027-1
Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323