Context-dependent upper-confidence bounds for directed exploration

作者: Adam White , Raksha Kumaraswamy , Martha White , Matthew Schlegel

DOI:

关键词:

摘要: Directed exploration strategies for reinforcement learning are critical an optimal policy in a minimal number of interactions with the environment. Many algorithms use optimism to direct exploration, either through visitation estimates or upper confidence bounds, as opposed data-inefficient like e-greedy that random, undirected exploration. Most data-efficient methods require significant computation, typically relying on learned model guide Least-squares have potential provide some data-efficiency benefits model-based approaches—because they summarize past interactions—with computation closer model-free approaches. In this work, we novel, computationally efficient, incremental strategy, leveraging property least-squares temporal difference (LSTD). We derive bounds action-values by LSTD, context-dependent (or state-dependent) noise variance. Such focuses subset variable states, and allows reduced other states. empirically demonstrate our algorithm can converge more quickly than using action-values.

参考文章(0)