Modeling score distributions in information retrieval

作者: Avi Arampatzis , Stephen Robertson

DOI: 10.1007/S10791-010-9145-5

关键词:

摘要: We review the history of modeling score distributions, focusing on mixture normal-exponential by investigating theoretical as well empirical evidence supporting its use. discuss previously suggested conditions which valid binary models should satisfy, such Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering component individually in pairs, under some limiting parameter values. From all mixtures past, current argument points to gamma most-likely universal model, with being a usable approximation. Beyond contribution, we provide experimental showing vector space or geometric models, BM25, `friendly' normal-exponential, that non-convexity problem possesses is practically not severe. Furthermore, recent non-binary speculate graded relevance, consider methods logistic regression for calibration.

参考文章(39)
Avi Arampatzis, Jean Beney, Theo P. van der Weide, Cornelis H. A. Koster, Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering. text retrieval conference. ,(2000)
William S. Cooper, Fredric C. Gey, Aitao Chen, Experiments in the Probabilistic Retrieval of Full Text Documents. text retrieval conference. pp. 127- 134 ,(1994)
Jamie Callan, Distributed Information Retrieval The Information Retrieval Series. ,vol. 5, pp. 127- 150 ,(2002) , 10.1007/0-306-47019-5_5
Kevyn Collins-Thompson, Jamie Callan, Paul Ogilvie, Yi Zhang, Information Filtering, Novelty Detection, and Named-Page Finding. text retrieval conference. ,(2002)
N. Fuhr, U. Pfeifer, C. Buckley, C. Bremkamp, M. Pollmann, Probabilistic learning approaches for indexing and retrieval with the TREC-2 collection text retrieval conference. pp. 67- 74 ,(1993)
David Hawking, Stephen Robertson, On Collection Size and Retrieval Effectiveness Information Retrieval. ,vol. 6, pp. 99- 105 ,(2003) , 10.1023/A:1022904715765
Evangelos Kanoulas, Virgil Pavlu, Keshi Dai, Javed A. Aslam, Modeling the Score Distributions of Relevant and Non-relevant Documents international conference on the theory of information retrieval. pp. 152- 163 ,(2009) , 10.1007/978-3-642-04417-5_14
Henrik Nottelmann, Norbert Fuhr, From Uncertain Inference to Probability of Relevance for Advanced IR Applications Lecture Notes in Computer Science. pp. 235- 250 ,(2003) , 10.1007/3-540-36618-0_17
Jacques Savoy, Report on CLEF-2003 Multilingual Tracks cross language evaluation forum. ,vol. 3237, pp. 64- 73 ,(2003) , 10.1007/978-3-540-30222-3_6