Learning to Search Better than Your Teacher

作者： Hal Daume , Kai-Wei Chang , Akshay Krishnamurthy , Alekh Agarwal , John Langford

DOI:

关键词: Machine learning 、 Computer science 、 Artificial intelligence 、 Search algorithm 、 Regret 、 Structured prediction 、 Work (electrical)

摘要: Methods for learning to search structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared that reference. This is unsatisfactory in many applications where the policy suboptimal and goal of improve upon it. Can work even when poor? We provide new algorithm, LOLS, which does well relative but additionally deviations from learned policy: local-optimality guarantee. Consequently, LOLS can unlike previous algorithms. enables us develop contextual bandits, partial information setting potential applications.

参考文章(19)

Joakim Nivre, An efficient algorithm for projective dependency parsing international workshop/conference on parsing technologies. pp. 149- 160 ,(2003)

J. Andrew Bagnell, Stéphane Ross, Reinforcement and Imitation Learning via Interactive No-Regret Learning arXiv: Learning. ,(2014)

Nicolo Cesa-Bianchi, Gabor Lugosi, Prediction, learning, and games ,(2006)

John Langford, Alina Beygelzimer, Sensitive error correcting output codes conference on learning theory. pp. 158- 172 ,(2005) , 10.1007/11503415_11

Mitch Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, None, Building a large annotated corpus of English: the penn treebank Computational Linguistics. ,vol. 19, pp. 313- 330 ,(1993) , 10.21236/ADA273556

Hal Daumé, John Langford, Stéphane Ross, Efficient programmable learning to search. arXiv: Learning. ,(2014)

Hal Daumé, Daniel Marcu, Learning as search optimization Proceedings of the 22nd international conference on Machine learning - ICML '05. pp. 169- 176 ,(2005) , 10.1145/1102351.1102373

H.L Abbott, M Katchalski, On the snake in the box problem Journal of Combinatorial Theory, Series B. ,vol. 45, pp. 13- 24 ,(1988) , 10.1016/0095-8956(88)90051-2

Yoav Goldberg, Joakim Nivre, Training Deterministic Parsers with Non-Deterministic Oracles Transactions of the Association for Computational Linguistics. ,vol. 1, pp. 403- 414 ,(2013) , 10.1162/TACL_A_00237

10.

J.R. Doppa, A. Fern, P. Tadepalli, HC-search: a learning framework for search-based structured prediction Journal of Artificial Intelligence Research. ,vol. 50, pp. 369- 407 ,(2014) , 10.1613/JAIR.4212

Learning to Search Better than Your Teacher

来源期刊

我的账户

Learning to Search Better than Your Teacher

来源期刊

相似文章 10

我的账户