Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

作者： Richard S. Sutton , Doina Precup , Satinder Singh

关键词:

摘要: Learning, planning, and representing knowledge at multiple levels of temporal ab- straction are key, longstanding challenges for AI. In this paper we consider how these can be addressed within the mathematical framework reinforce- ment learning Markov decision processes (MDPs). We extend usual notion action in to include options—closed-loop policies taking ac- tion over a period time. Examples options picking up an object, going lunch, traveling distant city, as well primitive actions such mus- cle twitches joint torques. Overall, show that enable temporally abstract included reinforcement frame- work natural general way. particular, may used interchangeably with planning methods dynamic pro- gramming Q-learning. Formally, set defined MDP constitutes semi-Markov process (SMDP), theory SMDPs provides foundation options. However, most interesting issues concern interplay between underlying SMDP thus beyond theory. present results three cases: 1) during execution interrupt thereby perform even better than planned, 2) introduce new intra-option able learn about option from fragments its execution, 3) propose subgoal improve themselves. All have precursors existing literature; contribution is establish them simpler more setting fewer changes framework. obtained without committing (or ruling out) any particular approach state abstraction, hierarchy, function approximation, or macro-utility problem.

umass.edu PDF 下载加速

swarthmore.edu PDF 下载加速

uvic.ca PDF 下载加速

sciencedirect.com LINK 下载加速

acm.org LINK 下载加速

doi.org PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(75)

Nick Axten, Allen Newell, Herbert A. Simon, Human Problem Solving. Contemporary Sociology. ,vol. 2, pp. 169- ,(1973) , 10.2307/2063712

Balaraman Ravindran, Sutton R S, Precup D, Singh S, Improved Switching among Temporally Abstract Actions". In Advances in Neural Information Processing Systems ,(1999)

Anil Nerode, Hans Rischel, Robert L. Grossman, Anders P. Ravn, Hybrid Systems ,(1993)

Marco Wiering, Jürgen Schmidhuber, HQ-learning Adaptive Behavior archive. ,vol. 6, pp. 219- ,(1998) , 10.1177/105971239700600202

Satinder P. Singh, Reinforcement learning with a hierarchy of abstract models national conference on artificial intelligence. pp. 202- 207 ,(1992)

John R. Koza, James P. Rice, Automatic programming of robots using genetic programming national conference on artificial intelligence. pp. 194- 201 ,(1992)

Sven Koenig, Reid Simmons, Probabilistic robot navigation in partially observable environments international joint conference on artificial intelligence. pp. 1080- 1087 ,(1995)

Steven Minton, Learning Search Control Knowledge: An Explanation-Based Approach ,(2011)

Blai Bonet, High-Level Planning and Control with Incomplete Information Using POMDP's ,(1998)

10.

Thomas Dean, Shieu-Hong Lin, Decomposition Techniques for Planning in Stochastic Domains international joint conference on artificial intelligence. pp. 1121- 1127 ,(1995)

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

来源期刊

我的账户

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

来源期刊

相似文章 10

我的账户