作者: Quanquan Gu , Dongruo Zhou , Jiafan He
DOI:
关键词: Minimax 、 Markov decision process 、 Computer science 、 Uncertainty principle 、 Regret 、 Upper and lower bounds 、 Reinforcement learning 、 Combinatorics 、 Discounting 、 Logarithm
摘要: … We study the reinforcement learning problem for discounted Markov Decision Processes (MDPs) … factors, which suggests that UCBVI-γ is nearly minimax optimal for discounted MDPs. …