作者: Gabriel Dulac-Arnold , Peter Sunehag , Ben Coppin , Richard Evans
DOI:
关键词:
摘要: Being able to reason in an environment with a large number of discrete actions is essential bringing reinforcement learning larger class problems. Recommender systems, industrial plants and language models are only some the many real-world tasks involving numbers for which current methods difficult or even often impossible apply. An ability generalize over set as well sub-linear complexity relative size both necessary handle such tasks. Current approaches not provide these, motivates work this paper. Our proposed approach leverages prior information about embed them continuous space upon it can generalize. Additionally, approximate nearest-neighbor allow logarithmic-time lookup actions, time-wise tractable training. This combined allows be applied large-scale problems previously intractable methods. We demonstrate our algorithm’s abilities on series having up one million actions.