作者: Joseph Modayil , Adam White , Richard S Sutton
关键词:
摘要: The term 'nexting' has been used by psychologists to refer the propensity of people and many other animals continually predict what will happen next in an immediate, local, personal sense. ability 'next' constitutes a basic kind awareness knowledge one's environment. In this paper we present results with robot that learns real time, making thousands predictions about sensory input signals at timescales from 0.1 8 seconds. Our are formulated as generalization value functions commonly reinforcement learning, where now arbitrary function is pseudo reward, discount rate determines timescale. We show six thousand predictions, each computed features state, can be learned updated online ten times per second on laptop computer, using standard temporal-difference(I») algorithm linear approximation. This approach sufficiently computationally efficient for real-time learning data achieve substantial accuracy within 30 minutes. Moreover, single tile-coded feature representation suffices accurately different over significant range timescales. also extend nexting beyond simple letting state more general form accuracy. General provides yet powerful mechanism acquire predictive dynamics its