An Investigation of Dirichlet Prior Smoothing's Performance Advantage

作者: Mark D. Smucker , James Allan

DOI:

关键词: Dirichlet distributionPattern recognitionComputer scienceLatent Dirichlet allocationAKADocument modelAdditive smoothingArtificial intelligenceLanguage modelSmoothing

摘要: In the language modeling approach to information retrieval, Dirichlet prior smoothing frequently outperforms fixed linear interpolated (aka Jelinek-Mercer) smoothing. The only difference between and is that determines amount of based on a document’s length. Our hypothesis was has an implicit document favors longer documents. We tested our by first calculating for given length from known relevant then determined performance each method with without prior. discovered when prior, matches or exceeds smoothing’s advantage appears come more favoring documents than better estimation model.

参考文章(20)
Dan Jurafsky, James H. Martin, Speech and Language Processing ,(1999)
Wessel Kraaij, Djoerd Hiemstra, Twenty-One at TREC-7: ad-hoc and cross-language track text retrieval conference. pp. 174- 185 ,(1998)
Tim Leek, Richard M. Schwartz, David R. H. Miller, BBN at TREC7: Using Hidden Markov Models for Information Retrieval. text retrieval conference. pp. 80- 89 ,(1998)
Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)
Djoerd Hiemstra, de Arjen P. Vries, Relating the new language models of information retrieval to the traditional retrieval models CTIT technical report series. pp. 1- 14 ,(2000)
Mark D. Smucker, James Allan, Lightening the load of document smoothing for better language modeling retrieval Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 699- 700 ,(2006) , 10.1145/1148170.1148324
Kenneth W. Church, Empirical estimates of adaptation: the chance of two noriegas is closer to p/2 than p 2 international conference on computational linguistics. pp. 180- 186 ,(2000) , 10.3115/990820.990847
Amit Singhal, Chris Buckley, Manclar Mitra, Pivoted document length normalization international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 21- 29 ,(1996) , 10.1145/3130348.3130365
W. E. JOHNSON, I.—PROBABILITY: THE DEDUCTIVE AND INDUCTIVE PROBLEMS Mind. ,vol. XLI, pp. 409- 423 ,(1932) , 10.1093/MIND/XLI.164.409