作者: Odalric-Ambrym Maillard , Edouard Leurent , Edouard Leurent
DOI:
关键词: Markov decision process 、 Budget constraint 、 Mathematical optimization 、 Generative model 、 Open-loop controller 、 Sample complexity 、 Time complexity 、 Computer science
摘要: We consider the problem of online planning in a Markov Decision Process when given only access to generative model, restricted open-loop policies - i.e. sequences actions and under budget constraint. In this setting, Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative practice, as we show numerical experiments. propose modified version with tighter upper-confidence bounds, KLOLOP, that leads better practical performances while retaining sample complexity bound. Finally, an efficient implementation significantly improves time both algorithms.