作者: Odalric-Ambrym Maillard , Mohammad Sadegh Talebi
DOI:
关键词:
摘要: We study the problem of learning transition matrices a set Markov chains from single stream observations on each chain. assume that are ergodic but otherwise unknown. The learner can sample sequentially to observe their states. goal is select various learn uniformly well with respect some loss function. introduce notion naturally extends squared for distributions case chains, and further characterize being \emph{uniformly good} in all instances. present novel algorithm efficiently balances \emph{exploration} \emph{exploitation} intrinsic this problem, without any prior knowledge chains. provide finite-sample PAC-type guarantees performance algorithm. Further, we show our asymptotically attains an optimal loss.