Static pruning of terms in inverted files

作者: Roi Blanco , Álvaro Barreiro

DOI: 10.1007/978-3-540-71496-5_9

关键词: Pruning (decision trees)Reduction (complexity)Data miningInverted indexComputer science

摘要: This paper addresses the problem of identifying collection dependent stop-words in order to reduce size inverted files. We present four methods automatically recognise stop-words, analyse tradeoff between efficiency and effectiveness, compare them with a previous pruning approach. The experiments allow us conclude that some situations is competitive respect other file reduction techniques.

参考文章(14)
Mike Gatford, Micheline Hancock-Beaulieu, Susan Jones, Stephen E. Robertson, Steve Walker, Okapi at TREC text retrieval conference. pp. 109- 123 ,(1994)
Andrew Turpin, Alistair Moffat, Compression and coding algorithms ,(2002)
Stephen E. Robertson, Steve Walker, Okapi/Keenbow at TREC-8. text retrieval conference. pp. 151- 162 ,(1999)
Dirk Bahle, Hugh E. Williams, Justin Zobel, Efficient phrase querying with an auxiliary index Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02. pp. 215- 221 ,(2002) , 10.1145/564376.564415
David Carmel, Doron Cohen, Ronald Fagin, Eitan Farchi, Michael Herscovici, Yoelle S. Maarek, Aya Soffer, Static index pruning for information retrieval systems international acm sigir conference on research and development in information retrieval. pp. 43- 50 ,(2001) , 10.1145/383952.383958
Edleno S De Moura, Célia F dos Santos, Daniel R Fernandes, Altigran S Silva, Pavel Calado, Mario A Nascimento, None, Improving Web search efficiency via a locality based static pruning method the web conference. pp. 235- 244 ,(2005) , 10.1145/1060745.1060783
Christopher Fox, A stop list for general text international acm sigir conference on research and development in information retrieval. ,vol. 24, pp. 19- 21 ,(1989) , 10.1145/378881.378888
S. E. Robertson, K. Sparck Jones, Relevance weighting of search terms Journal of the Association for Information Science and Technology. ,vol. 27, pp. 129- 146 ,(1976) , 10.1002/ASI.4630270302
G. Salton, C. S. Yang, C. T. Yu, A Theory of Term Importance in Automatic Text Analysis Journal of the Association for Information Science and Technology. ,vol. 26, pp. 33- 44 ,(1974) , 10.1002/ASI.4630260106
Howard Turtle, James Flood, Query evaluation: strategies and optimizations Information Processing and Management. ,vol. 31, pp. 831- 850 ,(1995) , 10.1016/0306-4573(95)00020-H