Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search

作者: Verena Heidrich-Meisner , Christian Igel

DOI: 10.1145/1553374.1553426

关键词:

摘要: Uncertainty arises in reinforcement learning from various sources, and therefore it is necessary to consider statistics based on several roll-outs for evaluating behavioral policies. We add an adaptive uncertainty handling Hoeffding empirical Bernstein races the CMA-ES, a variable metric evolution strategy proposed direct policy search. The adjusts individually number of episodes considered evaluation policy. performance estimation kept just accurate enough sufficiently good ranking candidate policies, which turn sufficient CMA-ES find better solutions. This increases speed as well robustness algorithm.

参考文章(22)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Luis Paquete, Mauro Birattari, Thomas Stützle, Klaus Varrentrapp, A Racing Algorithm for Configuring Metaheuristics genetic and evolutionary computation conference. pp. 11- 18 ,(2002)
Bo Yuan, Marcus Gallagher, Statistical racing techniques for improved empirical evaluation of evolutionary algorithms parallel problem solving from nature. ,vol. 3242, pp. 172- 181 ,(2004) , 10.1007/978-3-540-30217-9_18
Christian Schmidt, Jürgen Branke, Stephen E. Chick, Integrating techniques from statistical ranking into evolutionary algorithms Lecture Notes in Computer Science. pp. 752- 763 ,(2006) , 10.1007/11732242_73
Nils T. Siebel, Gerald Sommer, Evolutionary reinforcement learning of artificial neural networks hybrid intelligent systems. ,vol. 4, pp. 171- 183 ,(2007) , 10.3233/HIS-2007-4304
Peter Stagge, Averaging efficiently in the presence of noise Lecture Notes in Computer Science. pp. 188- 197 ,(1998) , 10.1007/BFB0056862
Jean-Yves Audibert, Rémi Munos, Csaba Szepesvári, Tuning Bandit Algorithms in Stochastic Environments Lecture Notes in Computer Science. pp. 150- 165 ,(2007) , 10.1007/978-3-540-75225-7_15
Oden Maron, Andrew W. Moore, The racing algorithm: model selection for lazy learners Artificial Intelligence Review. ,vol. 11, pp. 193- 225 ,(1997) , 10.1023/A:1006556606079
Xin Yao, Yong Liu, Fast Evolution Strategies Evolutionary Programming. ,vol. 26, pp. 151- 162 ,(1997) , 10.1007/BFB0014808