Unbiased Ranking Evaluation on a Budget

作者： Tobias Schnabel , Adith Swaminathan , Thorsten Joachims

关键词:

摘要: We address the problem of assessing quality a ranking system (e.g., search engine, recommender system, review ranker) given fixed budget for collecting expert judgments. In particular, we propose method that selects which items to judge in order optimize accuracy estimate. Our is not only efficient, but also provides estimates are unbiased --- unlike common approaches tend underestimate performance or have bias against new systems evaluated re-using previous relevance scores.

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(13)

Marek J. Druzdzel, Changhe Yuan, How Heavy Should the Tails Be the florida ai research society. pp. 799- 805 ,(2005)

Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference ,(2014)

Ben Carterette, Virgil Pavlu, Evangelos Kanoulas, Javed A. Aslam, James Allan, If I Had a Million Queries Lecture Notes in Computer Science. pp. 288- 300 ,(2009) , 10.1007/978-3-642-00958-7_27

Javed A. Aslam, Virgil Pavlu, Emine Yilmaz, A statistical method for system evaluation using incomplete judgments Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 541- 548 ,(2006) , 10.1145/1148170.1148263

Kalervo Järvelin, Jaana Kekäläinen, Cumulated gain-based evaluation of IR techniques ACM Transactions on Information Systems. ,vol. 20, pp. 422- 446 ,(2002) , 10.1145/582415.582418

Rabia Nuray, Fazli Can, Automatic ranking of retrieval systems in imperfect environments international acm sigir conference on research and development in information retrieval. pp. 379- 380 ,(2003) , 10.1145/860435.860510

Lihong Li, Jin Young Kim, Imed Zitouni, Toward Predicting the Outcome of an A/B Experiment for Search Relevance web search and data mining. pp. 37- 46 ,(2015) , 10.1145/2684822.2685311

Katja Hofmann, Shimon Whiteson, Maarten de Rijke, Estimating interleaved comparison outcomes from historical click data Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12. pp. 1779- 1783 ,(2012) , 10.1145/2396761.2398516

Sham M Kakade, Lihong Li, John Langford, Alex Strehl, Learning from Logged Implicit Exploration Data neural information processing systems. ,vol. 23, pp. 2217- 2225 ,(2010)

10.

Emine Yilmaz, Evangelos Kanoulas, Javed A. Aslam, A simple and efficient sampling method for estimating AP and NDCG Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. pp. 603- 610 ,(2008) , 10.1145/1390334.1390437

Unbiased Ranking Evaluation on a Budget

来源期刊

我的账户

Unbiased Ranking Evaluation on a Budget

来源期刊

相似文章 2

Can Deep Effectiveness Metrics Be Evaluated Using Shallow Judgment Pools

Measuring Recommender System Effects with Simulated Users.

我的账户