作者: Verena Heidrich-Meisner , Christian Igel
关键词:
摘要: Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. We add an adaptive uncertainty handling Hoeffding empirical Bernstein races the CMA-ES, a variable metric evolution strategy proposed direct policy search. The adjusts individually number of episodes considered evaluation policy. performance estimation kept just accurate enough sufficiently good ranking candidate policies, which turn sufficient CMA-ES find better solutions. This increases speed as well robustness algorithm.