作者: Kevin G. Jamieson , Lillian J. Ratliff , Tanner Fiez , Lalit Jain
DOI:
关键词:
摘要: In this paper we introduce the pure exploration transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, items $\mathcal{Z}\subset fixed confidence $\delta$, and an unknown vector $\theta^{\ast}\in goal is to infer $\arg\max_{z\in \mathcal{Z}} z^\top\theta^\ast$ with probability $1-\delta$ by making as few sequentially chosen noisy measurements form $x^\top\theta^{\ast}$ possible. When $\mathcal{X}=\mathcal{Z}$, setting generalizes bandits, when $\mathcal{X}$ standard basis \{0,1\}^d$, combinatorial bandits. The naturally arises limited due factors such availability or cost. As example, in drug discovery compounds dosages practitioner may be willing evaluate lab vitro cost safety reasons differ vastly from those $\mathcal{Z}$ that can safely administered patients vivo. Alternatively, recommender systems for books, books user queried about restricted known best-sellers even though might recommend more esoteric titles $\mathcal{Z}$. paper, provide instance-dependent lower bounds setting, algorithm matches these up logarithmic factors, evaluation. particular, present first non-asymptotic bandits nearly achieves information-theoretic bound.