Targeting specific distributions of trajectories in MDPs

作者： Michael L. Littman , Charles L. Isbell , David L. Roberts , Mark J. Nelson , Michael Mateas

DOI:

关键词: Computer science 、 State space 、 Space (commercial competition) 、 Class (computer programming) 、 Markov decision process 、 Distribution (mathematics) 、 Trajectory 、 Mathematical optimization 、 Tree (data structure)

摘要: We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal an agent is changed from finding optimal trajectory through state space to realizing specified distribution trajectories space. After motivating this formulation, we show how convert MDP into TTD-MDP. derive algorithm for non-deterministic policies by constructing tree that allows us compute locally-consistent policies. specify necessary conditions solving problem exactly and present heuristic when exact answer impossible or impractical. empirical results our in two domains: synthetic grid world stories interactive drama game.

参考文章(2)

Michael L. Littman, Markov games as a framework for multi-agent reinforcement learning Machine Learning Proceedings 1994. pp. 157- 163 ,(1994) , 10.1016/B978-1-55860-335-6.50027-1

Andrew Y Ng, Stuart Russell, None, Algorithms for Inverse Reinforcement Learning international conference on machine learning. ,vol. 67, pp. 663- 670 ,(2000) , 10.2460/AJVR.67.2.323

Targeting specific distributions of trajectories in MDPs

来源期刊

我的账户

Targeting specific distributions of trajectories in MDPs

来源期刊

相似文章 2

Director agent intervention strategies for interactive narrative environments

A globally optimal algorithm for TTD-MDPs

我的账户