An Approach to Information Retrieval Based on Statistical Model Selection

作者: Miles Efron

DOI:

关键词:

摘要: Abstract Building on previous work in the field of language modeling information retrieval (IR), this paper proposes a novel approach to document ranking based statistical model selection. The proposed offers two main contributions. First, we posit notion document’s “null model,” that conditions our assessment model’s significance with respect query. Second, introduce an information-theoretic complexity penalty into ranking. We rank documents penalized log-likelihood ratio comparing probability each generated query versus likelihood corresponding “null” it. Each is assessed by Akaike criterion (AIC), expected Kullback-Leibler divergence between observed (null or non-null) and underlying data. report experimental results where selection improvement over traditional LM retrieval.

参考文章(25)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Mike Gatford, Micheline Hancock-Beaulieu, Susan Jones, Stephen E. Robertson, Steve Walker, Okapi at TREC text retrieval conference. pp. 109- 123 ,(1994)
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Giambattista Amati, Frequentist and Bayesian Approach to Information Retrieval Lecture Notes in Computer Science. pp. 13- 24 ,(2006) , 10.1007/11735106_3
Chengxiang Zhai, John Lafferty, A study of smoothing methods for language models applied to information retrieval ACM Transactions on Information Systems. ,vol. 22, pp. 179- 214 ,(2004) , 10.1145/984321.984322
Javed A. Aslam, Emine Yilmaz, A geometric interpretation and analysis of R-precision Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05. pp. 664- 671 ,(2005) , 10.1145/1099554.1099721
S. E. Robertson, S. Walker, Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval international acm sigir conference on research and development in information retrieval. pp. 232- 241 ,(1994) , 10.5555/188490.188561
Thomas Roelleke, Jun Wang, A parallel derivation of probabilistic information retrieval models Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 107- 114 ,(2006) , 10.1145/1148170.1148192
Adam Berger, John Lafferty, Information retrieval as statistical translation international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 222- 229 ,(1999) , 10.1145/3130348.3130371