Design and Implementation of Relevance Assessments Using Crowdsourcing

作者: Omar Alonso , Ricardo Baeza-Yates

DOI: 10.1007/978-3-642-20161-5_16

关键词: Information retrievalWork (electrical)Data scienceInformation designCrowdsourcing software developmentHuman interface guidelinesApproval rateCrowdsourcingPresentationRelevance (information retrieval)Computer science

摘要: In the last years crowdsourcing has emerged as a viable platform for conducting relevance assessments. The main reason behind this trend is that makes possible to conduct experiments extremely fast, with good results and at low cost. However, like in any experiment, there are several details would make an experiment work or fail. To gather useful results, user interface guidelines, inter-agreement metrics, justification analysis important aspects of successful experiment. we explore design execution judgments using Amazon Mechanical Turk platform, introducing methodology assessments series TREC 8 fixed budget. Our findings indicate workers experts, even providing detailed feedback certain query-document pairs. We also importance document presentation when performing assessment tasks. Finally, show our examples interesting their own.

参考文章(15)
Omar Alonso, Ralf Schenkel, Martin Theobald, Crowdsourcing Assessments for XML Ranked Retrieval Lecture Notes in Computer Science. pp. 602- 606 ,(2010) , 10.1007/978-3-642-12275-0_57
Jiayu Tang, Mark Sanderson, Evaluation and User Preference Study on Spatial Diversity Lecture Notes in Computer Science. pp. 179- 190 ,(2010) , 10.1007/978-3-642-12275-0_18
Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng, Cheap and fast---but is it good? Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 254- 263 ,(2008) , 10.3115/1613715.1613751
Gabriella Kazai, Natasa Milic-Frayling, Jamie Costello, Towards methods for the collective gathering and quality control of relevance assessments international acm sigir conference on research and development in information retrieval. pp. 452- 459 ,(2009) , 10.1145/1571941.1572019
L. von Ahn, Games with a purpose IEEE Computer. ,vol. 39, pp. 92- 94 ,(2006) , 10.1109/MC.2006.196
Thomas W. Malone, Robert Laubacher, Chrysanthos N. Dellarocas, Harnessing Crowds: Mapping the Genome of Collective Intelligence Social Science Research Network. ,(2009) , 10.2139/SSRN.1381502
Chris Callison-Burch, Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk empirical methods in natural language processing. pp. 286- 295 ,(2009) , 10.3115/1699510.1699548
Kenneth A. Kinney, Scott B. Huffman, Juting Zhai, How evaluator domain expertise affects search result relevance judgments Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08. pp. 591- 598 ,(2008) , 10.1145/1458082.1458160
Oded Nov, Mor Naaman, Chen Ye, What drives content tagging Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI '08. pp. 1097- 1100 ,(2008) , 10.1145/1357054.1357225