Balloon Estimators for Improving and Scaling the Nonparametric Off-Policy Policy Gradient

作者: Fabio d’Aquino Hilt , JanNiklas Kolf , Christian Weiland , Joao Carvalho , Samuele Tosatto

DOI:

关键词:

摘要: Abstract The Nonparametric Off-Policy Policy Gradient (NOPG) introduces an algorithm to solve reinforcement learning tasks in continuous environments with low sample complexity …

参考文章(0)