Inferring probability of relevance using the method of logistic regression

作者： Fredric C. Gey

关键词: Logistic regression 、 Data mining 、 Statistical hypothesis testing 、 Cosine similarity 、 Inference 、 Relevance (information retrieval) 、 tf–idf 、 Statistics 、 Document retrieval 、 Multinomial logistic regression 、 Vector space model 、 Computer science 、 Probabilistic logic 、 Weighting

摘要: This research evaluates a model for probabilistic text and document retrieval; the utilizes technique of logistic regression to obtain equations which rank documents by probability relevance as function query properties. Since infers from statistical clues present in texts queries, we call it inference. By transforming distribution each clue into its standardized (one with mean μ = 0 standard deviation σ 1), method allows one apply coefficients derived training collection other collections, little loss predictive power. The is applied three well-known information retrieval test results are compared directly particular vector space uses term-frequency/inverse-document-frequency (tfidf) weighting cosine similarity measure. In comparison, inference performs significantly better than (in two collections) or equally well third collection) tfidf/cosine model. differences performances models were subjected tests see if statistically significant could have occurred chance.

参考文章(22)

Edward Alan Fox, Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types Cornell University. ,(1983)

Fredric Cheek Gey, Probabilistic dependence and logistic inference in information retrieval University of California at Berkeley. ,(1993)

William S. Cooper, Inconsistencies and Misnomers in Probabilistic IR. international acm sigir conference on research and development in information retrieval. pp. 57- 61 ,(1991)

Clement T Yu, Chris Buckley, K Lam, Gerard Salton, A Generalized Term Dependence Model in Information Retrieval A Generalized Term Dependence Model in Information Retrieval. ,(1983)

G. Salton, The SMART Retrieval System—Experiments in Automatic Document Processing Prentice-Hall, Inc.. ,(1971)

Gerard Salton, Automatic text processing: the transformation, analysis, and retrieval of information by computer Addison-Wesley Longman Publishing Co., Inc.. ,(1989)

Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)

Donna Harman, Overview of the first TREC conference international acm sigir conference on research and development in information retrieval. pp. 36- 47 ,(1993) , 10.1145/160688.160692

Stanley Lemeshow, David W. Hosmer, Applied Logistic Regression ,(1989)

10.

Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0

Inferring probability of relevance using the method of logistic regression

来源期刊

我的账户

Inferring probability of relevance using the method of logistic regression

来源期刊

相似文章 10

我的账户