Vox Populi: Collecting High-Quality Labels from a Crowd

作者: Ohad Shamir , Ofer Dekel

DOI:

关键词:

摘要: With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that labeled by a large heterogeneous set teachers. These test limits our existing theory, which largely assumes data is sampled i.i.d. from fixed distribution. In many cases, number teachers actually scales examples, each teacher providing just handful labels, precluding any statistically reliable assessment an individual teacher’s quality. this paper, we study problem pruning low-quality in crowd, order to improve label quality training set. Despite hurdles mentioned above, show fact achievable simple efficient algorithm, does not require example be repeatedly multiple We provide theoretical analysis algorithm back findings empirical evidence.

参考文章(13)
Erol A. Peköz, Sheldon M. Ross, A Second Course in Probability ,(2007)
Bernhard Schölkopf, Alexander J. Smola, Learning with Kernels The MIT Press. pp. 626- ,(2018) , 10.7551/MITPRESS/4175.001.0001
Ingo Steinwart, Sparseness of support vector machines Journal of Machine Learning Research. ,vol. 4, pp. 1071- 1105 ,(2003)
Michael Kearns, Ming Li, Learning in the Presence of Malicious Errors SIAM Journal on Computing. ,vol. 22, pp. 807- 837 ,(1993) , 10.1137/0222052
Michael Kearns, Efficient noise-tolerant learning from statistical queries Journal of the ACM. ,vol. 45, pp. 983- 1006 ,(1998) , 10.1145/293347.293351
Alex Kulesza, John Blitzer, Jennifer Wortman, Koby Crammer, Fernando Pereira, Learning Bounds for Domain Adaptation neural information processing systems. ,vol. 20, pp. 129- 136 ,(2007)
Victor S. Sheng, Foster Provost, Panagiotis G. Ipeirotis, Get another label? improving data quality and data mining using multiple, noisy labelers Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 614- 622 ,(2008) , 10.1145/1401890.1401965
Amir Globerson, Alex J. Smola, Sam T. Roweis, Choon H. Teo, Convex Learning with Invariances neural information processing systems. ,vol. 20, pp. 1489- 1496 ,(2007)
Michael Kearns, Jennifer Wortman, Koby Crammer, Learning from Multiple Sources Journal of Machine Learning Research. ,vol. 9, pp. 1757- 1774 ,(2008)
Ariel D. Procaccia, Ofer Dekel, Felix Fischer, Incentive compatible regression learning symposium on discrete algorithms. pp. 884- 893 ,(2008)