作者: Michael L. Littman , Charles L. Isbell , David L. Roberts , Mark J. Nelson , Michael Mateas
DOI:
关键词: Computer science 、 State space 、 Space (commercial competition) 、 Class (computer programming) 、 Markov decision process 、 Distribution (mathematics) 、 Trajectory 、 Mathematical optimization 、 Tree (data structure)
摘要: We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal an agent is changed from finding optimal trajectory through state space to realizing specified distribution trajectories space. After motivating this formulation, we show how convert MDP into TTD-MDP. derive algorithm for non-deterministic policies by constructing tree that allows us compute locally-consistent policies. specify necessary conditions solving problem exactly and present heuristic when exact answer impossible or impractical. empirical results our in two domains: synthetic grid world stories interactive drama game.