作者: Richard S. Sutton , Doina Precup , Satinder Singh
DOI: 10.1016/S0004-3702(99)00052-1
关键词:
摘要: Learning, planning, and representing knowledge at multiple levels of temporal ab- straction are key, longstanding challenges for AI. In this paper we consider how these can be addressed within the mathematical framework reinforce- ment learning Markov decision processes (MDPs). We extend usual notion action in to include options—closed-loop policies taking ac- tion over a period time. Examples options picking up an object, going lunch, traveling distant city, as well primitive actions such mus- cle twitches joint torques. Overall, show that enable temporally abstract included reinforcement frame- work natural general way. particular, may used interchangeably with planning methods dynamic pro- gramming Q-learning. Formally, set defined MDP constitutes semi-Markov process (SMDP), theory SMDPs provides foundation options. However, most interesting issues concern interplay between underlying SMDP thus beyond theory. present results three cases: 1) during execution interrupt thereby perform even better than planned, 2) introduce new intra-option able learn about option from fragments its execution, 3) propose subgoal improve themselves. All have precursors existing literature; contribution is establish them simpler more setting fewer changes framework. obtained without committing (or ruling out) any particular approach state abstraction, hierarchy, function approximation, or macro-utility problem.