作者: Gabriella Kazai , Homer Sung
DOI: 10.1007/978-3-319-06028-6_15
关键词:
摘要: The evaluation of Information Retrieval IR systems has recently been exploring the use preference judgments over two lists search results, presented side-by-side to judges. Such have shown capture a richer set relevance criteria than traditional methods collecting labels per single document. However, are expensive obtain and less reusable as any change either side necessitates new judgment. In this paper, we propose way measure dissimilarity between sides in experiments show how can be used prioritize queries judged an offline setting. Our proposed measure, referred Weighted Ranking Difference WRD, takes into account both ranking differences similarity documents across sides, where document may, for example, URL or query suggestion. We empirically evaluate our on large-scale, real-world dataset crowdsourced ranked auto-completion suggestions. that WRD score is indicative probability tie can, average, save 25% judging resources.