Hierarchical Optimal Control of MDPs

作者: Balaraman Ravindran , Singh Satinder , Sutton S , Precup Doina , McGovern Amy

DOI:

关键词:

摘要: Fundamental to reinforcement learning, as well the theory of systems and control, is problem representing knowledge about environment possible courses action hierarchically, at a multiplicity interrelated temporal scales. For example, human traveler must decide which cities go to, whether fly, drive, or walk, individual muscle contractions involved in each step. In this paper we survey new approach learning these decisions treated uniformly. Each low-level high-level course represented an option, (sub)controller termination condition. The options based on theories Markov semi-Markov decision processes, but extends significant ways. Options can be used place actions all planning methods conventionally learning. models learned for wide variety different subtasks, then rapidly combined solve tasks. enable simultaneously times scales, toward substantially increasing efficiency abilities systems.

参考文章(16)
Satinder P. Singh, Reinforcement learning with a hierarchy of abstract models national conference on artificial intelligence. pp. 202- 207 ,(1992)
Thomas G. Dietterich, The MAXQ Method for Hierarchical Reinforcement Learning international conference on machine learning. pp. 118- 126 ,(1998)
Doina Precup, Satinder P. Singh, Richard S. Sutton, Intra-Option Learning about Temporally Abstract Actions international conference on machine learning. pp. 556- 564 ,(1998)
Leslie Pack Kaelbling, Hierarchical learning in stochastic domains: preliminary results international conference on machine learning. pp. 167- 173 ,(1993) , 10.1016/B978-1-55860-307-3.50028-9
Long-Ji Lin, Reinforcement learning for robots using neural networks Carnegie Mellon University. ,(1992)
Doina Precup, Richard S. Sutton, Satinder Singh, Theoretical Results on Reinforcement Learning with Temporally Abstract Options european conference on machine learning. pp. 382- 393 ,(1998) , 10.1007/BFB0026709
Manfred Huber, Roderic A. Grupen, A feedback control structure for on-line learning tasks☆ Robotics and Autonomous Systems. ,vol. 22, pp. 303- 315 ,(1997) , 10.1016/S0921-8890(97)00044-4
Pat Langley, Editorial: On Machine Learning Machine Learning. ,vol. 1, pp. 5- 10 ,(1986) , 10.1023/A:1022687019898
D. Precup, S. Singh, R. S. Sutton, Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales University of Massachusetts. ,(1998)
Sebastian Thrun, Anton Schwartz, Finding Structure in Reinforcement Learning neural information processing systems. ,vol. 7, pp. 385- 392 ,(1994)