Got Many Labels?: Deriving Topic Labels from Multiple Sources for Social Media Posts using Crowdsourcing and Ensemble Learning

作者: Shuo Chang , Peng Dai , Jilin Chen , Ed H. Chi

DOI: 10.1145/2740908.2745401

关键词:

摘要: Online search and item recommendation systems are often based on being able to correctly label items with topical keywords. Typically, labelers analyze the main text associated item, but social media posts multimedia in nature contain contents beyond text. Topic labeling for is therefore an important open problem supporting effective recommendation. In this work, we present a novel solution Google+ posts, which integrated number of different entity extractors annotators, each responsible part post (e.g. body, embedded picture, video, or web link). To account varying quality annotator outputs, first utilized crowdsourcing measure accuracy individual then used supervised machine learning combine annotators their relative accuracy. Evaluating using ground truth data set, found that our approach substantially outperforms topic labels obtained from text, as well naive combinations annotators. By accurately applying according relevance results enables better

参考文章(18)
Nicolas Usunier, Samy Bengio, Jason Weston, WSABIE: scaling up to large vocabulary image annotation international joint conference on artificial intelligence. pp. 2764- 2770 ,(2011) , 10.5591/978-1-57735-516-8/IJCAI11-460
Jiakai Liu, Rong Hu, Meihong Wang, Yi Wang, Edward Y. Chang, Web-Scale Image Annotation pacific rim conference on multimedia. pp. 663- 674 ,(2008) , 10.1007/978-3-540-89796-5_68
Daniel S Weld, Christopher H. Lin, To Re(label), or Not To Re(label) national conference on artificial intelligence. ,(2014)
Gabriella Kazai, Jaap Kamps, Natasa Milic-Frayling, An analysis of human factors and label accuracy in crowdsourcing relevance judgments Information Retrieval. ,vol. 16, pp. 138- 178 ,(2013) , 10.1007/S10791-012-9205-0
Shuang-Hong Yang, Alek Kolcz, Andy Schlaikjer, Pankaj Gupta, Large-scale high-precision topic modeling on twitter knowledge discovery and data mining. pp. 1907- 1916 ,(2014) , 10.1145/2623330.2623336
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay, Scikit-learn: Machine Learning in Python Journal of Machine Learning Research. ,vol. 12, pp. 2825- 2830 ,(2011)
Guy Shani, Asela Gunawardana, Evaluating Recommendation Systems Recommender Systems Handbook. pp. 257- 297 ,(2011) , 10.1007/978-0-387-85820-3_8
Abhishek Gattani, Digvijay S. Lamba, Nikesh Garera, Mitul Tiwari, Xiaoyong Chai, Sanjib Das, Sri Subramaniam, Anand Rajaraman, Venky Harinarayan, AnHai Doan, Entity extraction, linking, classification, and tagging for social media Proceedings of the VLDB Endowment. ,vol. 6, pp. 1126- 1137 ,(2013) , 10.14778/2536222.2536237
Kalina Bontcheva, Dominic Rout, Making Sense of Social Media Streams through Semantics: a Survey Social Work. ,vol. 5, pp. 373- 403 ,(2014) , 10.3233/SW-130110
J. Jeon, V. Lavrenko, R. Manmatha, Automatic image annotation and retrieval using cross-media relevance models international acm sigir conference on research and development in information retrieval. pp. 119- 126 ,(2003) , 10.1145/860435.860459