作者: Fabio d’Aquino Hilt , JanNiklas Kolf , Christian Weiland , Joao Carvalho , Samuele Tosatto
DOI:
关键词:
摘要: Abstract The Nonparametric Off-Policy Policy Gradient (NOPG) introduces an algorithm to solve reinforcement learning tasks in continuous environments with low sample complexity …