Hierarchical POMDP controller optimization by likelihood maximization

作者： Laurent Charlin , Pascal Poupart , Marc Toussaint

DOI:

关键词:

摘要: Planning can often be simplified by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. [4] recently showed that hierarchy discovery problem framed as a non-convex optimization problem. However, inherent computational difficulty of solving such an makes it hard to scale real-world problems. In another line research, Toussaint [18] developed method solve planning problems maximum-likelihood estimation. this paper, we show how in partially observable domains tackled using similar maximum likelihood approach. Our technique first transforms dynamic Bayesian network through which hierarchical structure naturally discovered while optimizing policy. Experimental results demonstrate approach scales better than previous techniques based on optimization.

uni-trier.de 本地加速

arxiv.org 本地加速

harvard.edu 本地加速

uni-stuttgart.de 本地加速

aaai.org 本地加速

toronto.edu PDF 下载加速

aaai.org PDF 下载加速

uni-stuttgart.de PDF 下载加速

uwaterloo.ca PDF 下载加速

uci.edu PDF 下载加速

auai.org PDF 下载加速

arxiv.org PDF 下载加速

uwaterloo.ca LINK 下载加速

参考文章(18)

Shlomo Zilberstein, Christopher Amato, Daniel S. Bernstein, Solving POMDPs using quadratically constrained linear programs international joint conference on artificial intelligence. pp. 2418- 2424 ,(2007)

Anthony Rocco Cassandra, Leslie Pack Kaelbling, Exact and approximate algorithms for partially observable markov decision processes Brown University. ,(1998)

Amos Storkey, Stefan Harmeling, Marc Toussaint, Probabilistic inference for solving (PO) MDPs School of Informatics, Institute for Adaptive and Neural Computation. ,(2006)

Darius Braziunas, Craig Boutilier, Stochastic local search for POMDP controllers national conference on artificial intelligence. pp. 690- 696 ,(2004)

Pascal Poupart, Jesse Hoey, Alex Mihailidis, Axel von Bertoldi, Assisting persons with dementia during handwashing using a partially observable Markov decision process. international conference on computer vision systems. ,(2007) , 10.2390/BIECOLL-ICVS2007-89

Sebastian Thrun, Joelle Pineau, Geoff Gordon, Policy-contingent abstraction for robust robot control uncertainty in artificial intelligence. pp. 477- 484 ,(2002)

Leonid Peshkin, Leslie Pack Kaelbling, Kee-Eung Kim, Nicolas Meuleau, Learning finite-state controllers for partially observable environments uncertainty in artificial intelligence. pp. 427- 436 ,(1999)

A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X

Eric A. Hansen, An Improved Policy Iteration Algorithm for Partially Observable MDPs neural information processing systems. ,vol. 10, pp. 1015- 1021 ,(1997)

10.

G. Theocharous, K. Murphy, L.P. Kaelbling, Representing hierarchical POMDPs as DBNs for multi-scale robot localization international conference on robotics and automation. ,vol. 1, pp. 1045- 1051 ,(2004) , 10.1109/ROBOT.2004.1307288

Hierarchical POMDP controller optimization by likelihood maximization

来源期刊

我的账户

Hierarchical POMDP controller optimization by likelihood maximization

来源期刊

相似文章 10

我的账户