Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots

作者: Richard S. Sutton , Adam White , Banafsheh Rafiee , Sina Ghiassian

DOI:

关键词:

摘要: The ability to continually make predictions about the world may be central intelligence. Off-policy learning and general value functions (GVFs) are well-established algorithmic techniques for many signals while interacting with world. In past couple of years, ambitious works have used off-policy GVF improve control performance in both simulation robotic tasks. Many these use semi-gradient temporal-difference (TD) algorithms, like Q-learning, which potentially divergent. last decade, several TD algorithms been proposed that convergent computationally efficient, but not much is known how they perform practice, especially on robots. this work, we an empirical comparison modern three different robot platforms, providing insights into their strengths weaknesses. We also discuss challenges conducting fair comparative studies robots develop a new evaluation methodology successful applicable relatively complicated domain.

参考文章(26)
Tom Schaul, Daniel Horgan, David Silver, Karol Gregor, Universal Value Function Approximators international conference on machine learning. pp. 1312- 1320 ,(2015)
Leemon Baird, Residual Algorithms: Reinforcement Learning with Function Approximation Machine Learning Proceedings 1995. pp. 30- 37 ,(1995) , 10.1016/B978-1-55860-377-6.50013-X
Ian Gemp, Sridhar Mahadevan, Nicholas Jacek, Ji Liu, Stephen Giguere, Philip S. Thomas, Bo Liu, William Dabney, Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces arXiv: Learning. ,(2014)
Brian Tanner, Richard S. Sutton, TD(λ) networks Proceedings of the 22nd international conference on Machine learning - ICML '05. pp. 888- 895 ,(2005) , 10.1145/1102351.1102463
Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 993- 1000 ,(2009) , 10.1145/1553374.1553501
Bruno Scherrer, Matthieu Geist, Off-policy learning with eligibility traces: a survey Journal of Machine Learning Research. ,vol. 15, pp. 289- 333 ,(2014)
Justin A. Boyan, Technical Update: Least-Squares Temporal Difference Learning Machine Learning. ,vol. 49, pp. 233- 246 ,(2002) , 10.1023/A:1017936530646
Doina Precup, Patrick M. Pilarski, Richard S. Sutton, Adam White, Thomas Degris, Joseph Modayil, Michael Delp, Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction adaptive agents and multi-agents systems. pp. 761- 768 ,(2011) , 10.5555/2031678.2031726
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis, None, Human-level control through deep reinforcement learning Nature. ,vol. 518, pp. 529- 533 ,(2015) , 10.1038/NATURE14236
Michael L. Littman, Richard S Sutton, Predictive Representations of State neural information processing systems. ,vol. 14, pp. 1555- 1561 ,(2001)