Normal bandits of unknown means and variances

作者： Michael N. Katehakis , Wesley Cowan , Junya Honda

DOI: 10.5555/3122009.3242011

关键词: Random variable 、 Regret 、 Population 、 Combinatorics 、 Outcome (probability) 、 Open problem 、 Mathematics 、 Sequence 、 Asymptotically optimal algorithm 、 Sample (statistics)

摘要: Consider the problem of sampling sequentially from a finite number N ≥ 2 populations, specified by random variables Xki, i = 1,...,N; and k 1,2,..., where Xki denotes outcome population kth time it is sampled. It assumed that for each fixed i, {Xki}k≥1 sequence i.i.d. normal variables, with unknown mean µi variance σi2. The objective to have policy π deciding which populations sample at any t 1,2, ... so as maximize expected sum outcomes n total samples or equivalently minimize regret due lack on information parameters In this paper, we present simple inflated (ISM) index asymptotically optimal in sense Theorem 4 below. This resolves standing open Burnetas Katehakis (1996b). Additionally, horizon bounds are given.