作者: Mark D. Smucker , James Allan
DOI:
关键词: Dirichlet distribution 、 Pattern recognition 、 Computer science 、 Latent Dirichlet allocation 、 AKA 、 Document model 、 Additive smoothing 、 Artificial intelligence 、 Language model 、 Smoothing
摘要: In the language modeling approach to information retrieval, Dirichlet prior smoothing frequently outperforms fixed linear interpolated (aka Jelinek-Mercer) smoothing. The only difference between and is that determines amount of based on a document’s length. Our hypothesis was has an implicit document favors longer documents. We tested our by first calculating for given length from known relevant then determined performance each method with without prior. discovered when prior, matches or exceeds smoothing’s advantage appears come more favoring documents than better estimation model.