作者: Thomas Dueholm Hansen , Uri Zwick
DOI: 10.1007/978-3-642-17517-6_37
关键词:
摘要: Howard’s policy iteration algorithm is one of the most widely used algorithms for finding optimal policies controlling Markov Decision Processes (MDPs). When applied to weighted directed graphs, which may be viewed as Deterministic MDPs (DMDPs), can find Minimum Mean-Cost cycles (MMCC). Experimental studies suggest that works extremely well in this context. The theoretical complexity MMCCs a mystery. No polynomial time bound known on its running time. Prior work, there were only linear lower bounds number iterations performed by algorithm. We provide first graphs performs Ω(n 2) iterations, where n vertices graph.