DOI:
关键词:
摘要: In this chapter, we look at a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multiarmed bandit (SMAB) setting. EUCBV incorporates the arm elimination strategy proposed in UCB-Improved (Auer and Ortner, 2010) while taking into account the variance estimates to compute the arms’ confidence bounds, similar to UCBV (Audibert et al., 2009). Through a theoretical analysis we establish that EUCBV incurs a gap-dependent regret bound of O