Inferring probability of relevance using the method of logistic regression

作者: Fredric C. Gey

DOI: 10.5555/188490.188560

关键词: Logistic regressionData miningStatistical hypothesis testingCosine similarityInferenceRelevance (information retrieval)tf–idfStatisticsDocument retrievalMultinomial logistic regressionVector space modelComputer scienceProbabilistic logicWeighting

摘要: This research evaluates a model for probabilistic text and document retrieval; the utilizes technique of logistic regression to obtain equations which rank documents by probability relevance as function query properties. Since infers from statistical clues present in texts queries, we call it inference. By transforming distribution each clue into its standardized (one with mean μ = 0 standard deviation σ 1), method allows one apply coefficients derived training collection other collections, little loss predictive power. The is applied three well-known information retrieval test results are compared directly particular vector space uses term-frequency/inverse-document-frequency (tfidf) weighting cosine similarity measure. In comparison, inference performs significantly better than (in two collections) or equally well third collection) tfidf/cosine model. differences performances models were subjected tests see if statistically significant could have occurred chance.

参考文章(22)
Fredric Cheek Gey, Probabilistic dependence and logistic inference in information retrieval University of California at Berkeley. ,(1993)
William S. Cooper, Inconsistencies and Misnomers in Probabilistic IR. international acm sigir conference on research and development in information retrieval. pp. 57- 61 ,(1991)
Clement T Yu, Chris Buckley, K Lam, Gerard Salton, A Generalized Term Dependence Model in Information Retrieval A Generalized Term Dependence Model in Information Retrieval. ,(1983)
Gerard Salton, Automatic text processing: the transformation, analysis, and retrieval of information by computer Addison-Wesley Longman Publishing Co., Inc.. ,(1989)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Donna Harman, Overview of the first TREC conference international acm sigir conference on research and development in information retrieval. pp. 36- 47 ,(1993) , 10.1145/160688.160692
Stanley Lemeshow, David W. Hosmer, Applied Logistic Regression ,(1989)
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0