Multi-timescale nexting in a reinforcement learning robot

作者： Joseph Modayil , Adam White , Richard S Sutton

关键词:

摘要: The term 'nexting' has been used by psychologists to refer the propensity of people and many other animals continually predict what will happen next in an immediate, local, personal sense. ability 'next' constitutes a basic kind awareness knowledge one's environment. In this paper we present results with robot that learns real time, making thousands predictions about sensory input signals at timescales from 0.1 8 seconds. Our are formulated as generalization value functions commonly reinforcement learning, where now arbitrary function is pseudo reward, discount rate determines timescale. We show six thousand predictions, each computed features state, can be learned updated online ten times per second on laptop computer, using standard temporal-difference(I») algorithm linear approximation. This approach sufficiently computationally efficient for real-time learning data achieve substantial accuracy within 30 minutes. Moreover, single tile-coded feature representation suffices accurately different over significant range timescales. also extend nexting beyond simple letting state more general form accuracy. General provides yet powerful mechanism acquire predictive dynamics its

arxiv.org PDF 下载加速

doi.org LINK 下载加速

sagepub.com PDF 下载加速

sagepub.com LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(60)

Satinder P. Singh, Reinforcement learning with a hierarchy of abstract models national conference on artificial intelligence. pp. 202- 207 ,(1992)

Olivier Sigaud, Martin V. Butz, Giovanni Pezzulo, Gianluca Baldassarre, Anticipatory Behavior in Adaptive Learning Systems ,(2008)

Michael Gabriel, John Moore, None, Learning and Computational Neuroscience: Foundations of Adaptive Networks MIT Press. ,(1990)

Csaba Szepesvári, Hamid Reza Maei, Richard S. Sutton, A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation neural information processing systems. pp. 1609- 1616 ,(2008)

Michael Cunningham, Intelligence: Its Organization and Development ,(1972)

Steven M. LaValle, Planning Algorithms: Introductory Material ,(2006) , 10.1017/CBO9780511546877

Olivier Sigaud, Martin V. Butz, Pierre Gérard, Anticipatory Behavior in Adaptive Learning Systems: Foundations, Theories, and Systems ,(2003)

Richard S. Sutton, Integrated architecture for learning, planning, and reacting based on approximating dynamic programming international conference on machine learning. pp. 216- 224 ,(1990) , 10.1016/B978-1-55860-141-3.50030-4

Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)

10.

Richard S. Sutton, Beyond reward: the problem of knowledge and data inductive logic programming. pp. 2- 6 ,(2011) , 10.1007/978-3-642-31951-8_2

Multi-timescale nexting in a reinforcement learning robot

来源期刊

我的账户

Multi-timescale nexting in a reinforcement learning robot

来源期刊

相似文章 10

我的账户