作者: Odalric-Ambrym Maillard , Timothy A Mann , Shie Mannor
DOI:
关键词:
摘要: … RL in an MDP where p represents the transition kernel of a station-action pair and f represents the value function of the MDP for a … Our formal notion of MDP hardness is summarized in …