作者: Doina Precup , Satinder P. Singh , Richard S. Sutton
DOI:
关键词:
摘要: Several researchers have proposed modeling temporally abstract actions in reinforcement learning by the combination of a policy and termination condition, which we refer to as an option. Value functions over options models can be learned using methods designed for semi-Markov decision processes (SMDPs). However, all these require option executed termination. In this paper explore that learn about from small fragments experience consistent with option, even if itself is not executed. We call intra-option because they within Intra-optionmethods are sometimes much more efficient than SMDP use off-policy temporaldifference mechanisms simultaneously experience, just few were actually present value multi-time consequences options. computational examples new faster effectively when cannot at all. also sketch convergence proof intraoption learning.