作者: John Fearnley
DOI: 10.1007/978-3-642-14162-1_46
关键词:
摘要: We study policy iteration for infinite-horizon Markov decision processes. It has recently been shown style algorithms have exponential lower bounds in a two player game setting. extend these to processes with the total reward and average-reward optimality criteria.