作者: Sooraj Bhat , David L. Roberts , Mark J. Nelson , Charles L. Isbell , Michael Mateas
关键词: Computer science 、 Distribution (number theory) 、 Algorithm 、 Sampling (statistics) 、 Trajectory 、 Greedy algorithm 、 Markov decision process 、 State (computer science) 、 Mathematical optimization 、 Degree (graph theory) 、 Convex optimization
摘要: In this paper, we discuss the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs)---a variant MDPs in which goal is to realize a specified distribution trajectories through state space---as general agent-coordination framework.We present several advances previous work on TTD-MDPs. We improve existing algorithm for solving TTD-MDPs by deriving greedy that finds policy provably minimizes global KL-divergence from target distribution. test new applying drama management, where system must coordinate behavior many agents ensure game follows coherent storyline, keeping with author's desires, and offers high degree replayability.Although show suboptimal strategies will fail some cases, validate suggests they can well practice. also our provides guaranteed accuracy even those little additional computational cost. Further, illustrate how approach be applied online, eliminating memory-intensive offline sampling necessary approach.