Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

作者： Tanguy Urvoy , Stefan Riezler , Artem Sokolov

DOI:

关键词:

摘要: We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at single predicted point, instead correct structure, is observed in learning. application discriminative reranking Statistical Machine Translation (SMT) learning algorithm has access 1-BLEU evaluation translation obtaining gold standard reference translation. In our experiment feedback obtained by evaluating BLEU on translations without revealing them algorithm. This can be thought as simulation interactive machine SMT system personalized user who provides point translations. Our experiments show that improves quality and comparable approaches employ more informative

uni-trier.de 本地加速

arxiv.org 本地加速

harvard.edu 本地加速

uni-heidelberg.de 本地加速

uni-heidelberg.de PDF 下载加速

arxiv.org PDF 下载加速

参考文章(47)

Alexander Rakhlin, Jacob D. Abernethy, An Efficient Bandit Algorithm for sqrt(T) Regret in Online Multiclass Prediction conference on learning theory. ,(2009)

Christopher D. Manning, Michael Collins, Daphne Koller, Ben Taskar, Dan Klein, Max-Margin Parsing empirical methods in natural language processing. pp. 1- 8 ,(2004)

Ofer Dekel, Alekh Agarwal, Lin Xiao, Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. conference on learning theory. pp. 28- 40 ,(2010)

James C. Spall, Introduction to Stochastic Search and Optimization ,(2003)

Robert E. Schapire, Wei Chu, Lihong Li, Lev Reyzin, Contextual bandits with linear Payoff functions international conference on artificial intelligence and statistics. ,vol. 15, pp. 208- 214 ,(2011)

Advanced Lectures on Machine Learning ML Summer Schools 2003. ,(2004) , 10.1007/B100712

Boris T Poljak, Introduction to optimization Optimization Software, Publications Division. ,(1987)

John N. Tsitsiklis, Dimitri P. Bertsekas, Neuro-dynamic programming ,(1996)

Hal Daume, Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Learning to Search Better than Your Teacher international conference on machine learning. pp. 2058- 2066 ,(2015)

10.

Olivier Chapelle, Eren Manavoglu, Romer Rosales, Simple and Scalable Response Prediction for Display Advertising ACM Transactions on Intelligent Systems and Technology. ,vol. 5, pp. 61- ,(2014) , 10.1145/2532128

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

来源期刊

我的账户

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

来源期刊

相似文章 10

我的账户