A globally optimal algorithm for TTD-MDPs

作者： Sooraj Bhat , David L. Roberts , Mark J. Nelson , Charles L. Isbell , Michael Mateas

关键词: Computer science 、 Distribution (number theory) 、 Algorithm 、 Sampling (statistics) 、 Trajectory 、 Greedy algorithm 、 Markov decision process 、 State (computer science) 、 Mathematical optimization 、 Degree (graph theory) 、 Convex optimization

摘要: In this paper, we discuss the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs)---a variant MDPs in which goal is to realize a specified distribution trajectories through state space---as general agent-coordination framework.We present several advances previous work on TTD-MDPs. We improve existing algorithm for solving TTD-MDPs by deriving greedy that finds policy provably minimizes global KL-divergence from target distribution. test new applying drama management, where system must coordinate behavior many agents ensure game follows coherent storyline, keeping with author's desires, and offers high degree replayability.Although show suboptimal strategies will fail some cases, validate suggests they can well practice. also our provides guaranteed accuracy even those little additional computational cost. Further, illustrate how approach be applied online, eliminating memory-intensive offline sampling necessary approach.

参考文章(22)

Brian Magerko, Story representation and interactive drama national conference on artificial intelligence. pp. 87- 92 ,(2005)

R. Michael Young, Mark O. Riedl, Arnav Jhala, C. J. Saretto, Mark Branly, R. J. Martin, An architecture for integrating plan-based behavior generation with interactive game environments. J. Game Dev.. ,vol. 1, pp. 1- 29 ,(2004)

Michael L. Littman, Markov games as a framework for multi-agent reinforcement learning Machine Learning Proceedings 1994. pp. 157- 163 ,(1994) , 10.1016/B978-1-55860-335-6.50027-1

Mark J. Nelson, Michael Mateas, Search-based drama management in the interactive fiction Anchorhead national conference on artificial intelligence. pp. 99- 104 ,(2005)

Brenda Kay Laurel, Toward the design of a computer-based interactive fantasy system / ,(1986)

Mark J. Nelson, David L. Roberts, Charles L. Isbell, Michael Mateas, Reinforcement learning for declarative optimization-based drama management Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06. pp. 775- 782 ,(2006) , 10.1145/1160633.1160769

Gerald Tesauro, TD-Gammon, a self-teaching backgammon program, achieves master-level play Neural Computation. ,vol. 6, pp. 215- 219 ,(1994) , 10.1162/NECO.1994.6.2.215

Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323

Bradford W. Mott, James C. Lester, U-director Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06. pp. 977- 984 ,(2006) , 10.1145/1160633.1160808

10.

Gerald Tesauro, Practical Issues in Temporal Difference Learning Machine Learning. ,vol. 8, pp. 257- 277 ,(1992) , 10.1007/BF00992697

A globally optimal algorithm for TTD-MDPs

来源期刊

我的账户

A globally optimal algorithm for TTD-MDPs

来源期刊

相似文章 1

Director agent intervention strategies for interactive narrative environments

我的账户