A Nonparametric Off-Policy Policy Gradient

作者: Jan Peters , Hany Abdulsamad , Samuele Tosatto , Joao Carvalho

DOI:

关键词:

摘要: Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is …

参考文章(31)
Dirk Ormoneit, Śaunak Sen, Kernel-based reinforcement learning ,(1999)
Leonid Peshkin, Kee-Eung Kim, Nicolas Meuleau, Exploration in Gradient-Based Reinforcement Learning ,(2001)
Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X
John Schulman, None, Trust Region Policy Optimization international conference on machine learning. pp. 1889- 1897 ,(2015)
Christian R. Shelton, Policy improvement for POMDPs using normalized importance sampling uncertainty in artificial intelligence. pp. 496- 503 ,(2001)
E. A. Nadaraya, On Estimating Regression Theory of Probability and Its Applications. ,vol. 9, pp. 141- 142 ,(1964) , 10.1137/1109020
Gavin Taylor, Ronald Parr, Kernelized value function approximation for reinforcement learning Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 1017- 1024 ,(2009) , 10.1145/1553374.1553504
Jianqing Fan, Design-adaptive Nonparametric Regression Journal of the American Statistical Association. ,vol. 87, pp. 998- 1004 ,(1992) , 10.1080/01621459.1992.10476255
Xin Xu, Dewen Hu, Xicheng Lu, Kernel-Based Least Squares Policy Iteration for Reinforcement Learning IEEE Transactions on Neural Networks. ,vol. 18, pp. 973- 992 ,(2007) , 10.1109/TNN.2007.899161