Hierarchical POMDP controller optimization by likelihood maximization

作者: Laurent Charlin , Pascal Poupart , Marc Toussaint

DOI:

关键词:

摘要: Planning can often be simplified by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. [4] recently showed that hierarchy discovery problem framed as a non-convex optimization problem. However, inherent computational difficulty of solving such an makes it hard to scale real-world problems. In another line research, Toussaint [18] developed method solve planning problems maximum-likelihood estimation. this paper, we show how in partially observable domains tackled using similar maximum likelihood approach. Our technique first transforms dynamic Bayesian network through which hierarchical structure naturally discovered while optimizing policy. Experimental results demonstrate approach scales better than previous techniques based on optimization.

参考文章(18)
Shlomo Zilberstein, Christopher Amato, Daniel S. Bernstein, Solving POMDPs using quadratically constrained linear programs international joint conference on artificial intelligence. pp. 2418- 2424 ,(2007)
Anthony Rocco Cassandra, Leslie Pack Kaelbling, Exact and approximate algorithms for partially observable markov decision processes Brown University. ,(1998)
Amos Storkey, Stefan Harmeling, Marc Toussaint, Probabilistic inference for solving (PO) MDPs School of Informatics, Institute for Adaptive and Neural Computation. ,(2006)
Darius Braziunas, Craig Boutilier, Stochastic local search for POMDP controllers national conference on artificial intelligence. pp. 690- 696 ,(2004)
Pascal Poupart, Jesse Hoey, Alex Mihailidis, Axel von Bertoldi, Assisting persons with dementia during handwashing using a partially observable Markov decision process. international conference on computer vision systems. ,(2007) , 10.2390/BIECOLL-ICVS2007-89
Sebastian Thrun, Joelle Pineau, Geoff Gordon, Policy-contingent abstraction for robust robot control uncertainty in artificial intelligence. pp. 477- 484 ,(2002)
Leonid Peshkin, Leslie Pack Kaelbling, Kee-Eung Kim, Nicolas Meuleau, Learning finite-state controllers for partially observable environments uncertainty in artificial intelligence. pp. 427- 436 ,(1999)
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Eric A. Hansen, An Improved Policy Iteration Algorithm for Partially Observable MDPs neural information processing systems. ,vol. 10, pp. 1015- 1021 ,(1997)
G. Theocharous, K. Murphy, L.P. Kaelbling, Representing hierarchical POMDPs as DBNs for multi-scale robot localization international conference on robotics and automation. ,vol. 1, pp. 1045- 1051 ,(2004) , 10.1109/ROBOT.2004.1307288