Multi-timescale nexting in a reinforcement learning robot

作者: Joseph Modayil , Adam White , Richard S Sutton

DOI: 10.1177/1059712313511648

关键词:

摘要: The term 'nexting' has been used by psychologists to refer the propensity of people and many other animals continually predict what will happen next in an immediate, local, personal sense. ability 'next' constitutes a basic kind awareness knowledge one's environment. In this paper we present results with robot that learns real time, making thousands predictions about sensory input signals at timescales from 0.1 8 seconds. Our are formulated as generalization value functions commonly reinforcement learning, where now arbitrary function is pseudo reward, discount rate determines timescale. We show six thousand predictions, each computed features state, can be learned updated online ten times per second on laptop computer, using standard temporal-difference(I») algorithm linear approximation. This approach sufficiently computationally efficient for real-time learning data achieve substantial accuracy within 30 minutes. Moreover, single tile-coded feature representation suffices accurately different over significant range timescales. also extend nexting beyond simple letting state more general form accuracy. General provides yet powerful mechanism acquire predictive dynamics its

参考文章(60)
Satinder P. Singh, Reinforcement learning with a hierarchy of abstract models national conference on artificial intelligence. pp. 202- 207 ,(1992)
Olivier Sigaud, Martin V. Butz, Giovanni Pezzulo, Gianluca Baldassarre, Anticipatory Behavior in Adaptive Learning Systems ,(2008)
Michael Gabriel, John Moore, None, Learning and Computational Neuroscience: Foundations of Adaptive Networks MIT Press. ,(1990)
Csaba Szepesvári, Hamid Reza Maei, Richard S. Sutton, A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation neural information processing systems. pp. 1609- 1616 ,(2008)
Damien Ernst, Arthur Louette, Introduction to Reinforcement Learning MIT Press. ,(1998)
Richard S. Sutton, Beyond reward: the problem of knowledge and data inductive logic programming. pp. 2- 6 ,(2011) , 10.1007/978-3-642-31951-8_2