作者: Filip Radlinski , Lihong Li , Katja Hofmann
DOI:
关键词:
摘要: Online evaluation is one of the most common approaches to measure effectiveness an information retrieval system. It involves fielding system real users, and observing these users' interactions in-situ while they engage with This allows actual users world needs play important part in assessing quality. As such, online complements alternative offline which may provide more easily interpretable outcomes, yet are often less realistic when measuring quality user experience.In this survey, we overview techniques for retrieval. We show how used controlled experiments, segmenting them into experiment designs that allow absolute or relative assessments. Our presentation different metrics further partitions based on sized experimental units commonly interest: documents, lists sessions. Additionally, include extensive discussion recent work data re-use, estimation historical data.A substantial focuses practical issues: How run evaluations practice, select parameters, take account ethical considerations inherent evaluations, limitations experimenters should be aware of. While published experimentation today at large scale systems millions also emphasize same can applied small scale. To end, makes it easier use smaller scales encourage studying real-world seeking a wide range scenarios. Finally, present summary area, describe open problems, as well postulating future directions.