作者: Nicolò Cesa-Bianchi , Sébastien Bubeck
DOI:
关键词:
摘要: A multi-armed bandit problem - or, simply, a is sequential allocation defined by set of actions. At each time step, unit resource allocated to an action and some observable payoff obtained. The goal maximize the total obtained in sequence allocations. name refers colloquial term for slot machine (a "one-armed bandit" American slang). In casino, when player facing many machines at once "multi-armed bandit"), must repeatedly choose where insert next coin. Multi-armed problems are most basic examples decision with exploration-exploitation trade-off. This balance between staying option that gave highest payoffs past exploring new options might give higher future. Although study dates back 1930s, trade-offs arise several modern applications, such as ad placement, website optimization, packet routing. Mathematically, process associated option. this book, focus on two extreme cases which analysis regret particularly simple elegant: independent identically distributed adversarial payoffs. Besides setting finitely actions, it also analyzes important variants extensions, contextual model. monograph ideal reference students researchers interest problems.