Reliable information retrieval evaluation with incomplete and biased judgements

作者： Stefan Büttcher , Charles L. A. Clarke , Peter C. K. Yeung , Ian Soboroff

关键词: Artificial intelligence 、 Computer science 、 Ranking 、 Machine learning 、 Pooling 、 Information retrieval 、 Set (psychology) 、 Quality (business) 、 Ranking (information retrieval)

摘要: Information retrieval evaluation based on the pooling method is inherently biased against systems that did not contribute to pool of judged documents. This may distort results obtained about relative quality evaluated and thus lead incorrect conclusions performance a particular ranking technique.We examine magnitude this effect explore how it can be countered by automatically building an unbiased set judgements from original, through pooling. We compare with other approaches problem incomplete judgements, such as bpref, show proposed leads higher accuracy, especially if manual rich in documents, but highly some systems.

参考文章(16)

Nick Craswell, Ian Soboroff, Charles L. A. Clarke, Overview of the TREC 2004 Terabyte Track. text retrieval conference. ,(2004)

Leif Grönqvist, Evaluating Latent Semantic Vector Models with Synonym Tests and Document Retrieval pp. 86- 88 ,(2005)

Ellen M. Voorhees, The Philosophy of Information Retrieval Evaluation Lecture Notes in Computer Science. ,vol. 2406, pp. 355- 370 ,(2002) , 10.1007/3-540-45691-0_34

Stefan Büttcher, Ian Soboroff, Charles L. A. Clarke, The TREC 2006 Terabyte Track text retrieval conference. ,(2006)

Per Ahlgren, Leif Grönqvist, Retrieval evaluation with incomplete relevance data Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06. pp. 872- 873 ,(2006) , 10.1145/1183614.1183773

CYRIL CLEVERDON, The Cranfield tests on index language devices Aslib Proceedings. ,vol. 19, pp. 47- 59 ,(1997) , 10.1108/EB050097

Emine Yilmaz, Javed A. Aslam, Estimating average precision with incomplete and imperfect judgments Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06. pp. 102- 111 ,(2006) , 10.1145/1183614.1183633

M. G. KENDALL, A NEW MEASURE OF RANK CORRELATION Biometrika. ,vol. 30, pp. 81- 93 ,(1938) , 10.1093/BIOMET/30.1-2.81

Javed A. Aslam, Virgil Pavlu, Emine Yilmaz, A statistical method for system evaluation using incomplete judgments Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 541- 548 ,(2006) , 10.1145/1148170.1148263

10.

Kalervo Järvelin, Jaana Kekäläinen, Cumulated gain-based evaluation of IR techniques ACM Transactions on Information Systems. ,vol. 20, pp. 422- 446 ,(2002) , 10.1145/582415.582418

Reliable information retrieval evaluation with incomplete and biased judgements

来源期刊

我的账户

Reliable information retrieval evaluation with incomplete and biased judgements

来源期刊

相似文章 10

我的账户