Practical Open-Loop Optimistic Planning

作者: Odalric-Ambrym Maillard , Edouard Leurent , Edouard Leurent

DOI:

关键词: Markov decision processBudget constraintMathematical optimizationGenerative modelOpen-loop controllerSample complexityTime complexityComputer science

摘要: We consider the problem of online planning in a Markov Decision Process when given only access to generative model, restricted open-loop policies - i.e. sequences actions and under budget constraint. In this setting, Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative practice, as we show numerical experiments. propose modified version with tighter upper-confidence bounds, KLOLOP, that leads better practical performances while retaining sample complexity bound. Finally, an efficient implementation significantly improves time both algorithms.

参考文章(0)