Universal Value Function Approximators

作者: Tom Schaul , Daniel Horgan , David Silver , Karol Gregor

DOI:

关键词: EmbeddingConstruct (python library)Value (mathematics)FactoringMathematicsFunction (mathematics)Artificial intelligenceSupervised learningState (functional analysis)Reinforcement learning

摘要: … Value functions are a core component of reinforcement learning systems. The main idea is to to construct a single function approximator V (s… we introduce universal value function approx…

参考文章(24)
Ronan Collobert, Clément Farabet, Koray Kavukcuoglu, Torch7: A Matlab-like Environment for Machine Learning neural information processing systems. ,(2011)
Leslie Pack Kaelbling, Hierarchical learning in stochastic domains: preliminary results international conference on machine learning. pp. 167- 173 ,(1993) , 10.1016/B978-1-55860-307-3.50028-9
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
David Foster, Peter Dayan, Structure in the Space of Value Functions Machine Learning. ,vol. 49, pp. 325- 346 ,(2002) , 10.1023/A:1017944732463
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta, Off-Policy Temporal Difference Learning with Function Approximation international conference on machine learning. pp. 417- 424 ,(2001)
Volodymyr Mnih, Ioannis Antonoglou, Koray Kavukcuoglu, Daan Wierstra, Martin A. Riedmiller, Alex Graves, David Silver, Playing Atari with Deep Reinforcement Learning arXiv: Learning. ,(2013)
Marc Peter Deisenroth, Peter Englert, Jan Peters, Dieter Fox, Multi-Task Policy Search for Robotics international conference on robotics and automation. pp. 3876- 3881 ,(2014) , 10.1109/ICRA.2014.6907421
Jens Kober, Andreas Wilhelm, Erhan Oztop, Jan Peters, Reinforcement learning to adjust parametrized motor primitives to new situations Autonomous Robots. ,vol. 33, pp. 361- 379 ,(2012) , 10.1007/S10514-012-9290-3
Ilya Scheidwasser, George Konidaris, Andrew G. Barto, Transfer in reinforcement learning via shared features Journal of Machine Learning Research. ,vol. 13, pp. 1333- 1371 ,(2012) , 10.5555/2188385.2343689
Richard S. Sutton, Doina Precup, Satinder Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence. ,vol. 112, pp. 181- 211 ,(1999) , 10.1016/S0004-3702(99)00052-1