Online Evaluation for Information Retrieval

作者: Filip Radlinski , Lihong Li , Katja Hofmann

DOI:

关键词:

摘要: Online evaluation is one of the most common approaches to measure effectiveness an information retrieval system. It involves fielding system real users, and observing these users' interactions in-situ while they engage with This allows actual users world needs play important part in assessing quality. As such, online complements alternative offline which may provide more easily interpretable outcomes, yet are often less realistic when measuring quality user experience.In this survey, we overview techniques for retrieval. We show how used controlled experiments, segmenting them into experiment designs that allow absolute or relative assessments. Our presentation different metrics further partitions based on sized experimental units commonly interest: documents, lists sessions. Additionally, include extensive discussion recent work data re-use, estimation historical data.A substantial focuses practical issues: How run evaluations practice, select parameters, take account ethical considerations inherent evaluations, limitations experimenters should be aware of. While published experimentation today at large scale systems millions also emphasize same can applied small scale. To end, makes it easier use smaller scales encourage studying real-world seeking a wide range scenarios. Finally, present summary area, describe open problems, as well postulating future directions.

参考文章(210)
Csaba Szepesvari, Remi Munos, Lihong Li, {Toward Minimax Off-policy Value Estimation} international conference on artificial intelligence and statistics. pp. 608- 616 ,(2015)
John Lawson, Design and Analysis of Experiments with R Chapman and Hall/CRC. ,(2014) , 10.1201/B17883
Dragomir Yankov, Lihong Li, Pavel Berkhin, Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems. arXiv: Learning. ,(2015)
Lihong Li, Shunbao Chen, Jim Kleban, Ankur Gupta, Counterfactual Estimation and Optimization of Click Metrics in Search Engines: A Case Study the web conference. pp. 929- 934 ,(2015) , 10.1145/2740908.2742562
Falk Scholer, Milad Shokouhi, Bodo Billerbeck, Andrew Turpin, Using clicks as implicit judgments: expectations versus observations european conference on information retrieval. ,vol. 4956, pp. 28- 39 ,(2008) , 10.5555/1793274.1793282
Yang Song, Xiaolin Shi, Xin Fu, Evaluating and predicting user engagement change with degraded search relevance Proceedings of the 22nd international conference on World Wide Web - WWW '13. pp. 1213- 1224 ,(2013) , 10.1145/2488388.2488494
Doina Precup, Satinder P. Singh, Richard S. Sutton, Eligibility Traces for Off-Policy Policy Evaluation international conference on machine learning. pp. 759- 766 ,(2000)
Hiroshi Fujimoto, Minoru Etoh, Akira Kinno, Yoshikazu Akinaga, None, Web user profiling on proxy logs and its evaluation in personalization asia pacific web conference. pp. 107- 118 ,(2011) , 10.1007/978-3-642-20291-9_13