Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

作者: Richard S. Sutton , Doina Precup , Satinder Singh

DOI: 10.1016/S0004-3702(99)00052-1

关键词:

摘要: Learning, planning, and representing knowledge at multiple levels of temporal ab- straction are key, longstanding challenges for AI. In this paper we consider how these can be addressed within the mathematical framework reinforce- ment learning Markov decision processes (MDPs). We extend usual notion action in to include options—closed-loop policies taking ac- tion over a period time. Examples options picking up an object, going lunch, traveling distant city, as well primitive actions such mus- cle twitches joint torques. Overall, show that enable temporally abstract included reinforcement frame- work natural general way. particular, may used interchangeably with planning methods dynamic pro- gramming Q-learning. Formally, set defined MDP constitutes semi-Markov process (SMDP), theory SMDPs provides foundation options. However, most interesting issues concern interplay between underlying SMDP thus beyond theory. present results three cases: 1) during execution interrupt thereby perform even better than planned, 2) introduce new intra-option able learn about option from fragments its execution, 3) propose subgoal improve themselves. All have precursors existing literature; contribution is establish them simpler more setting fewer changes framework. obtained without committing (or ruling out) any particular approach state abstraction, hierarchy, function approximation, or macro-utility problem.

参考文章(75)
Nick Axten, Allen Newell, Herbert A. Simon, Human Problem Solving. Contemporary Sociology. ,vol. 2, pp. 169- ,(1973) , 10.2307/2063712
Anil Nerode, Hans Rischel, Robert L. Grossman, Anders P. Ravn, Hybrid Systems ,(1993)
Marco Wiering, Jürgen Schmidhuber, HQ-learning Adaptive Behavior archive. ,vol. 6, pp. 219- ,(1998) , 10.1177/105971239700600202
Satinder P. Singh, Reinforcement learning with a hierarchy of abstract models national conference on artificial intelligence. pp. 202- 207 ,(1992)
John R. Koza, James P. Rice, Automatic programming of robots using genetic programming national conference on artificial intelligence. pp. 194- 201 ,(1992)
Sven Koenig, Reid Simmons, Probabilistic robot navigation in partially observable environments international joint conference on artificial intelligence. pp. 1080- 1087 ,(1995)
Thomas Dean, Shieu-Hong Lin, Decomposition Techniques for Planning in Stochastic Domains international joint conference on artificial intelligence. pp. 1121- 1127 ,(1995)