Learning to Search Better than Your Teacher

作者: Hal Daume , Kai-Wei Chang , Akshay Krishnamurthy , Alekh Agarwal , John Langford

DOI:

关键词: Machine learningComputer scienceArtificial intelligenceSearch algorithmRegretStructured predictionWork (electrical)

摘要: Methods for learning to search structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared that reference. This is unsatisfactory in many applications where the policy suboptimal and goal of improve upon it. Can work even when poor? We provide new algorithm, LOLS, which does well relative but additionally deviations from learned policy: local-optimality guarantee. Consequently, LOLS can unlike previous algorithms. enables us develop contextual bandits, partial information setting potential applications.

参考文章(19)
Joakim Nivre, An efficient algorithm for projective dependency parsing international workshop/conference on parsing technologies. pp. 149- 160 ,(2003)
J. Andrew Bagnell, Stéphane Ross, Reinforcement and Imitation Learning via Interactive No-Regret Learning arXiv: Learning. ,(2014)
Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)
John Langford, Alina Beygelzimer, Sensitive error correcting output codes conference on learning theory. pp. 158- 172 ,(2005) , 10.1007/11503415_11
Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556
Hal Daumé, John Langford, Stéphane Ross, Efficient programmable learning to search. arXiv: Learning. ,(2014)
Hal Daumé, Daniel Marcu, Learning as search optimization Proceedings of the 22nd international conference on Machine learning - ICML '05. pp. 169- 176 ,(2005) , 10.1145/1102351.1102373
H.L Abbott, M Katchalski, On the snake in the box problem Journal of Combinatorial Theory, Series B. ,vol. 45, pp. 13- 24 ,(1988) , 10.1016/0095-8956(88)90051-2
Yoav Goldberg, Joakim Nivre, Training Deterministic Parsers with Non-Deterministic Oracles Transactions of the Association for Computational Linguistics. ,vol. 1, pp. 403- 414 ,(2013) , 10.1162/TACL_A_00237
J.R. Doppa, A. Fern, P. Tadepalli, HC-search: a learning framework for search-based structured prediction Journal of Artificial Intelligence Research. ,vol. 50, pp. 369- 407 ,(2014) , 10.1613/JAIR.4212