A globally optimal algorithm for TTD-MDPs

作者: Sooraj Bhat , David L. Roberts , Mark J. Nelson , Charles L. Isbell , Michael Mateas

DOI: 10.1145/1329125.1329367

关键词: Computer scienceDistribution (number theory)AlgorithmSampling (statistics)TrajectoryGreedy algorithmMarkov decision processState (computer science)Mathematical optimizationDegree (graph theory)Convex optimization

摘要: In this paper, we discuss the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs)---a variant MDPs in which goal is to realize a specified distribution trajectories through state space---as general agent-coordination framework.We present several advances previous work on TTD-MDPs. We improve existing algorithm for solving TTD-MDPs by deriving greedy that finds policy provably minimizes global KL-divergence from target distribution. test new applying drama management, where system must coordinate behavior many agents ensure game follows coherent storyline, keeping with author's desires, and offers high degree replayability.Although show suboptimal strategies will fail some cases, validate suggests they can well practice. also our provides guaranteed accuracy even those little additional computational cost. Further, illustrate how approach be applied online, eliminating memory-intensive offline sampling necessary approach.

参考文章(22)
Brian Magerko, Story representation and interactive drama national conference on artificial intelligence. pp. 87- 92 ,(2005)
R. Michael Young, Mark O. Riedl, Arnav Jhala, C. J. Saretto, Mark Branly, R. J. Martin, An architecture for integrating plan-based behavior generation with interactive game environments. J. Game Dev.. ,vol. 1, pp. 1- 29 ,(2004)
Michael L. Littman, Markov games as a framework for multi-agent reinforcement learning Machine Learning Proceedings 1994. pp. 157- 163 ,(1994) , 10.1016/B978-1-55860-335-6.50027-1
Mark J. Nelson, Michael Mateas, Search-based drama management in the interactive fiction Anchorhead national conference on artificial intelligence. pp. 99- 104 ,(2005)
Mark J. Nelson, David L. Roberts, Charles L. Isbell, Michael Mateas, Reinforcement learning for declarative optimization-based drama management Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06. pp. 775- 782 ,(2006) , 10.1145/1160633.1160769
Gerald Tesauro, TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation. ,vol. 6, pp. 215- 219 ,(1994) , 10.1162/NECO.1994.6.2.215
Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323
Bradford W. Mott, James C. Lester, U-director Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06. pp. 977- 984 ,(2006) , 10.1145/1160633.1160808
Gerald Tesauro, Practical Issues in Temporal Difference Learning Machine Learning. ,vol. 8, pp. 257- 277 ,(1992) , 10.1007/BF00992697