作者: Balaraman Ravindran , Singh Satinder , Sutton S , Precup Doina , McGovern Amy
DOI:
关键词:
摘要: Fundamental to reinforcement learning, as well the theory of systems and control, is problem representing knowledge about environment possible courses action hierarchically, at a multiplicity interrelated temporal scales. For example, human traveler must decide which cities go to, whether fly, drive, or walk, individual muscle contractions involved in each step. In this paper we survey new approach learning these decisions treated uniformly. Each low-level high-level course represented an option, (sub)controller termination condition. The options based on theories Markov semi-Markov decision processes, but extends significant ways. Options can be used place actions all planning methods conventionally learning. models learned for wide variety different subtasks, then rapidly combined solve tasks. enable simultaneously times scales, toward substantially increasing efficiency abilities systems.