Improving the Quality of Crowdsourced Image Labeling via Label Similarity

作者: Yi-Li Fang , Hai-Long Sun , Peng-Peng Chen , Ting Deng

DOI: 10.1007/S11390-017-1770-7

关键词:

摘要: Crowdsourcing is an effective method to obtain large databases of manually-labeled images, which especially important for image understanding with supervised machine learning algorithms. However, several kinds tasks regarding labeling, e.g., dog breed recognition, it hard achieve high-quality results. Therefore, further optimizing crowdsourcing workflow mainly involves task allocation and result inference. For allocation, we design a two-round framework, contains smart decision mechanism based on information entropy determine whether perform the second round allocation. Regarding inference, after quantifying similarity all labels, two graphical models are proposed describe labeling process corresponding inference algorithms designed improve quality labeling. Extensive experiments real-world in Crowdflower synthesis datasets were conducted. The experimental results demonstrate superiority these methods comparison state-of-the-art methods.

参考文章(27)
A. P. Dawid, A. M. Skene, Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 20- 28 ,(1979) , 10.2307/2346806
Jie Yang, Alessandro Bozzon, Geert-Jan Houben, Knowledge Crowdsourcing Acceleration Engineering the Web in the Big Data Era. pp. 639- 643 ,(2015) , 10.1007/978-3-319-19890-3_47
Yee Whye Teh, Balaji Lakshminarayanan, Inferring ground truth from multi-annotator ordinal data: a probabilistic approach arXiv: Machine Learning. ,(2013)
Hongwei Li, Bo Zhao, Ariel Fuxman, The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing the web conference. pp. 165- 176 ,(2014) , 10.1145/2566486.2568033
Yudian Zheng, Jiannan Wang, Guoliang Li, Reynold Cheng, Jianhua Feng, QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications international conference on management of data. pp. 1031- 1046 ,(2015) , 10.1145/2723372.2749430
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei, ImageNet: A large-scale hierarchical image database computer vision and pattern recognition. pp. 248- 255 ,(2009) , 10.1109/CVPR.2009.5206848
Victor S. Sheng, Foster Provost, Panagiotis G. Ipeirotis, Get another label? improving data quality and data mining using multiple, noisy labelers Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 614- 622 ,(2008) , 10.1145/1401890.1401965
Luca Bogoni, Vikas C. Raykar, Linda Moy, Shipeng Yu, Linda H. Zhao, Gerardo Hermosillo Valadez, Charles Florin, Learning From Crowds Journal of Machine Learning Research. ,vol. 11, pp. 1297- 1322 ,(2010) , 10.5555/1756006.1859894
Luis von Ahn, Laura Dabbish, Labeling images with a computer game human factors in computing systems. pp. 319- 326 ,(2004) , 10.1145/985692.985733
Javier R. Movellan, Ting-fan Wu, Jacob Whitehill, Paul L. Ruvolo, Jacob Bergsma, Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise neural information processing systems. ,vol. 22, pp. 2035- 2043 ,(2009)