作者: Hal Daume , Kai-Wei Chang , Akshay Krishnamurthy , Alekh Agarwal , John Langford
DOI:
关键词: Machine learning 、 Computer science 、 Artificial intelligence 、 Search algorithm 、 Regret 、 Structured prediction 、 Work (electrical)
摘要: Methods for learning to search structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared that reference. This is unsatisfactory in many applications where the policy suboptimal and goal of improve upon it. Can work even when poor? We provide new algorithm, LOLS, which does well relative but additionally deviations from learned policy: local-optimality guarantee. Consequently, LOLS can unlike previous algorithms. enables us develop contextual bandits, partial information setting potential applications.