作者: Hal Daumé , Chicheng Zhang , Alekh Agarwal , Sahand N Negahban , John Langford
DOI:
关键词:
摘要: We investigate the feasibility of learning from a mix both fully-labeled supervised data and contextual bandit data. specifically consider settings in which underlying signal may be different between these two sources. Theoretically, we state prove no-regret algorithms for that is robust to misaligned cost distributions Empirically, evaluate some on large selection datasets, showing our approach feasible helpful practice.