A Nonparametric Off-Policy Policy Gradient

作者： Jan Peters , Hany Abdulsamad , Samuele Tosatto , Joao Carvalho

DOI:

关键词:

摘要: Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is …

参考文章(31)

Martin Riedmiller, Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method Machine Learning: ECML 2005. pp. 317- 328 ,(2005) , 10.1007/11564096_32

Dirk Ormoneit, Śaunak Sen, Kernel-based reinforcement learning ,(1999)

Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Exploration in Gradient-Based Reinforcement Learning ,(2001)

Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X

John Schulman, None, Trust Region Policy Optimization international conference on machine learning. pp. 1889- 1897 ,(2015)

Christian R. Shelton, Policy improvement for POMDPs using normalized importance sampling uncertainty in artificial intelligence. pp. 496- 503 ,(2001)

E. A. Nadaraya, On Estimating Regression Theory of Probability and Its Applications. ,vol. 9, pp. 141- 142 ,(1964) , 10.1137/1109020

Gavin Taylor, Ronald Parr, Kernelized value function approximation for reinforcement learning Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 1017- 1024 ,(2009) , 10.1145/1553374.1553504

Jianqing Fan, Design-adaptive Nonparametric Regression Journal of the American Statistical Association. ,vol. 87, pp. 998- 1004 ,(1992) , 10.1080/01621459.1992.10476255

10.

Xin Xu, Dewen Hu, Xicheng Lu, Kernel-Based Least Squares Policy Iteration for Reinforcement Learning IEEE Transactions on Neural Networks. ,vol. 18, pp. 973- 992 ,(2007) , 10.1109/TNN.2007.899161

A Nonparametric Off-Policy Policy Gradient

来源期刊

我的账户

A Nonparametric Off-Policy Policy Gradient

来源期刊

相似文章 2

Statistically Efficient Off-Policy Policy Gradients.

An Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions

我的账户