A study of parameter tuning for term frequency normalization

作者: Ben HE , Iadh Ounis

DOI: 10.1145/956863.956867

关键词:

摘要: Most current term frequency normalization approaches for information retrieval involve the use of parameters. The tuning these parameters has an important impact on overall performance system. Indeed, a small variation in involved parameter(s) could lead to precision/recall values. are dependent document collections. As consequence, effective parameter value cannot be obtained given new collection without extensive training data. In this paper, we propose novel and robust method parameter(s), by measuring effect within query terms. illustration, apply our Amati \& Van Rijsbergen's so-called 2. experiments ad-hoc TREC-6,7,8 tasks TREC-8,9,10 Web tracks show that is independent collections able provide reliable good performance.

参考文章(9)
Mike Gatford, Micheline Hancock-Beaulieu, Susan Jones, Stephen E. Robertson, Steve Walker, Okapi at TREC text retrieval conference. pp. 109- 123 ,(1994)
Gianni Amati, Cornelis Joost van Rijsbergen, Term Frequency Normalization via Pareto Distributions Lecture Notes in Computer Science. pp. 183- 192 ,(2002) , 10.1007/3-540-45886-7_13
Amit Singhal, Chris Buckley, Manclar Mitra, Pivoted document length normalization international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 21- 29 ,(1996) , 10.1145/3130348.3130365
S. E. Robertson, S. Walker, Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval international acm sigir conference on research and development in information retrieval. pp. 232- 241 ,(1994) , 10.5555/188490.188561
Abdur Chowdhury, M. Catherine McCabe, David Grossman, Ophir Frieder, Document normalization revisited Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02. pp. 381- 382 ,(2002) , 10.1145/564376.564454
Gianni Amati, Cornelis Joost Van Rijsbergen, None, Probabilistic models of information retrieval based on measuring the divergence from randomness ACM Transactions on Information Systems. ,vol. 20, pp. 357- 389 ,(2002) , 10.1145/582415.582416
Chengxiang Zhai, John Lafferty, A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 334- 342 ,(2001) , 10.1145/3130348.3130377
K. Sparck Jones, S. Walker, S.E. Robertson, A probabilistic model of information retrieval: development and comparative experiments Information Processing & Management. ,vol. 36, pp. 779- 808 ,(2000) , 10.1016/S0306-4573(00)00015-7
G. Salton, A. Wong, C. S. Yang, A vector space model for automatic indexing Communications of the ACM. ,vol. 18, pp. 613- 620 ,(1975) , 10.1145/361219.361220