Alternatives to Bpref

关键词:

摘要: Recently, a number of TREC tracks have adopted retrieval effectiveness metric called bpref which has been designed for evaluation environments with incomplete relevance data. A graded-relevance version this rpref also proposed. However, we show that the application Q-measure, normalised Discounted Cumulative Gain (nDCG) or Average Precision (AveP)to condensed lists, obtained by ?ltering out all unjudged documents from original ranked is actually better solution to incompleteness problem than bpref. Furthermore, use graded boosts robustness IR and therefore Q-measure nDCG based on lists are best choices. To end, four test collections NTCIR compare ten different metrics in terms system ranking stability pairwise discriminative power.

acm.org 本地加速

参考文章(16)

Emine Yilmaz, Javed A. Aslam, Estimating average precision with incomplete and imperfect judgments Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06. pp. 102- 111 ,(2006) , 10.1145/1183614.1183633

Ian Soboroff, Charles Nicholas, Patrick Cahan, Ranking retrieval systems without relevance judgments international acm sigir conference on research and development in information retrieval. pp. 66- 73 ,(2001) , 10.1145/383952.383961

Ellen M. Voorhees, Chris Buckley, The effect of topic set size on retrieval experiment error Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02. pp. 316- 323 ,(2002) , 10.1145/564376.564432

Javed A. Aslam, Robert Savell, On the effectiveness of evaluating retrieval systems in the absence of relevance judgments international acm sigir conference on research and development in information retrieval. pp. 361- 362 ,(2003) , 10.1145/860435.860501

Javed A. Aslam, Virgil Pavlu, Emine Yilmaz, A statistical method for system evaluation using incomplete judgments Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 541- 548 ,(2006) , 10.1145/1148170.1148263

Gordon V. Cormack, Christopher R. Palmer, Charles L. A. Clarke, Efficient construction of large test collections international acm sigir conference on research and development in information retrieval. pp. 282- 289 ,(1998) , 10.1145/290941.291009

Tetsuya Sakai, Evaluating evaluation metrics based on the bootstrap Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 525- 532 ,(2006) , 10.1145/1148170.1148261

Kalervo Järvelin, Jaana Kekäläinen, Cumulated gain-based evaluation of IR techniques ACM Transactions on Information Systems. ,vol. 20, pp. 422- 446 ,(2002) , 10.1145/582415.582418

Justin Zobel, How reliable are the results of large-scale information retrieval experiments? international acm sigir conference on research and development in information retrieval. pp. 307- 314 ,(1998) , 10.1145/290941.291014

10.

Ian Soboroff, Dynamic test collections Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 276- 283 ,(2006) , 10.1145/1148170.1148220

Alternatives to Bpref

来源期刊

我的账户

Alternatives to Bpref

来源期刊

相似文章 10

我的账户