作者: Alexander G. Schwing , Jian Peng , Jian Peng , Frank S. He , Frank S. He
DOI:
关键词:
摘要: We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with constrained optimization approach to tighten optimality and encourage faster reward propagation. Our technique makes more practical by drastically reducing time. evaluate performance our on 49 games challenging Arcade Learning Environment, report significant improvements in both time accuracy.