PEGASUS: A policy search method for large MDPs and POMDPs

DOI:

关键词: Mathematical optimization 、 Exponential function 、 Partially observable Markov decision process 、 Action (physics) 、 Value (ethics) 、 Space (mathematics) 、 State (functional analysis) 、 Mathematics 、 Markov decision process 、 Polynomial

摘要: We propose a new approach to the problem of searching space policies for Markov decision process (MDP) or partially observable (POMDP), given model. Our is based on following observation: Any (PO)MDP can be transformed into an "equivalent" POMDP in which all state transitions (given current and action) are deterministic. This reduces general policy search one we need only consider POMDPs with deterministic transitions. give natural way estimating value these POMDPs. Policy then simply performed by high estimated value. also establish conditions under our estimates will good, recovering theoretical results similar those Kearns, Mansour Ng [7], but "sample complexity" bounds that have polynomial rather than exponential dependence horizon time. method applies arbitrary POMDPs, including ones infinite action spaces. present empirical small discrete problem, complex continuous state/continuous involving learning ride bicycle.

stanford.edu PDF 下载加速

arxiv.org PDF 下载加速

uiuc.edu PDF 下载加速

berkeley.edu LINK 下载加速

acm.org LINK 下载加速

参考文章(17)

Hajime Kimura, Masayuki Yamamura, Shigenobu Kobayashi, Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward Machine Learning Proceedings 1995. pp. 295- 303 ,(1995) , 10.1016/B978-1-55860-377-6.50044-X

David Pollard, Empirical Processes: Theory and Applications ,(1990)

Jette Randløv, Preben Alstrøm, Learning to Drive a Bicycle Using Reinforcement Learning and Shaping international conference on machine learning. pp. 463- 471 ,(1998)

John N. Tsitsiklis, Benjamin Van Roy, Learning and value function approximation in complex decision processes Massachusetts Institute of Technology. ,(1998)

Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)

Vladimir Naumovich Vapnik, Estimation of Dependences Based on Empirical Data ,(2010)

John R. Birge, Franois Louveaux, Introduction to Stochastic Programming ,(2011)

Leonid Peshkin, Leslie Pack Kaelbling, Kee-Eung Kim, Nicolas Meuleau, Learning finite-state controllers for partially observable environments uncertainty in artificial intelligence. pp. 427- 436 ,(1999)

Paul Goldberg, Mark Jerrum, Bounding the Vapnik-Chervonenkis dimension of concept classes parameterized by real numbers conference on learning theory. ,vol. 18, pp. 361- 369 ,(1993) , 10.1145/168304.168377

10.

David Haussler, Decision theoretic generalizations of the PAC model for neural net and other learning applications Information & Computation. ,vol. 100, pp. 78- 150 ,(1992) , 10.1016/0890-5401(92)90010-D

PEGASUS: A policy search method for large MDPs and POMDPs

来源期刊

我的账户

PEGASUS: A policy search method for large MDPs and POMDPs

来源期刊

相似文章 10

我的账户