Dissimilarity Based Query Selection for Efficient Preference Based IR Evaluation

作者: Gabriella Kazai , Homer Sung

DOI: 10.1007/978-3-319-06028-6_15

关键词:

摘要: The evaluation of Information Retrieval IR systems has recently been exploring the use preference judgments over two lists search results, presented side-by-side to judges. Such have shown capture a richer set relevance criteria than traditional methods collecting labels per single document. However, are expensive obtain and less reusable as any change either side necessitates new judgment. In this paper, we propose way measure dissimilarity between sides in experiments show how can be used prioritize queries judged an offline setting. Our proposed measure, referred Weighted Ranking Difference WRD, takes into account both ranking differences similarity documents across sides, where document may, for example, URL or query suggestion. We empirically evaluate our on large-scale, real-world dataset crowdsourced ranked auto-completion suggestions. that WRD score is indicative probability tie can, average, save 25% judging resources.

参考文章(20)
Advances in Information Retrieval Theory Lecture Notes in Computer Science. ,vol. 5766, ,(2009) , 10.1007/978-3-642-04417-5
Mehdi Hosseini, Ingemar J. Cox, Natasa Milic-Frayling, Vishwa Vinay, Trevor Sweeting, Selecting a subset of queries for acquisition of further relevance judgements international conference on the theory of information retrieval. pp. 113- 124 ,(2011) , 10.1007/978-3-642-23318-0_12
Emine Yilmaz, Javed A. Aslam, Stephen Robertson, A new rank correlation coefficient for information retrieval Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. pp. 587- 594 ,(2008) , 10.1145/1390334.1390435
Benjamin Piwowarski, Andrew Trotman, Mounia Lalmas, Sound and complete relevance assessment for XML retrieval ACM Transactions on Information Systems. ,vol. 27, pp. 1- 37 ,(2008) , 10.1145/1416950.1416951
John Guiver, Stefano Mizzaro, Stephen Robertson, A few good topics: Experiments in topic set reduction for retrieval evaluation ACM Transactions on Information Systems. ,vol. 27, pp. 21- ,(2009) , 10.1145/1629096.1629099
Grace S. Shieh, A weighted Kendall's tau statistic Statistics & Probability Letters. ,vol. 39, pp. 17- 24 ,(1998) , 10.1016/S0167-7152(98)00006-6
Javed A. Aslam, Virgil Pavlu, Emine Yilmaz, A statistical method for system evaluation using incomplete judgments Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 541- 548 ,(2006) , 10.1145/1148170.1148263
Jianhan Zhu, Jun Wang, Vishwa Vinay, Ingemar J. Cox, Topic (query) selection for IR evaluation international acm sigir conference on research and development in information retrieval. pp. 802- 803 ,(2009) , 10.1145/1571941.1572136
William Webber, Alistair Moffat, Justin Zobel, A similarity measure for indefinite rankings ACM Transactions on Information Systems. ,vol. 28, pp. 20- ,(2010) , 10.1145/1852102.1852106
Jinyoung Kim, Gabriella Kazai, Imed Zitouni, Relevance dimensions in preference-based IR evaluation international acm sigir conference on research and development in information retrieval. pp. 913- 916 ,(2013) , 10.1145/2484028.2484168