作者: Robert Kleinberg , Aleksandrs Slivkins , Eli Upfal
DOI: 10.1145/3299873
关键词:
摘要: In a multi-armed bandit problem, an online algorithm chooses from set of strategies in sequence trials to maximize the total payoff chosen strategies. While performance algorithms with small finite strategy is well understood, problems large sets are still topic active investigation, motivated by practical applications, such as auctions and web advertisement. The goal research identify broad natural classes functions that enable design efficient solutions.In this work, we study general setting for which form metric space, function satisfies Lipschitz condition respect metric. We refer problem MAB problem. present solution setting. That is, every define isometry invariant bounds below comes arbitrarily close meeting bound. Furthermore, our technique gives even better results benign functions. also address full-feedback (“best expert”) version where after round payoffs all arms revealed.