Compactness — A useful feature for generating search index

作者: K. P. Pushpalatha , G. Raju

DOI: 10.1109/ICTEE.2012.6208623

关键词:

摘要: Generating meaningful or relevant keywords for information retrieval using Data Mining techniques is a highly field. Term Discrimination Values (TDVs) are better measures compared to frequency term weights select the keywords. Terms with high TDVs will generate good Hamdouchi, P. Willet and Carolyn J Crouch have developed various algorithms TDVs. In earlier days weighted was used compute But these simple frequencies not enough retrieving documents. Here we use some new features, connected distribution of terms within document, called distributional Distributional features such as First Appearance, Last Compactness on number parts, distance between first last occurrence variance positions occurrences etc. pointers importance in document. Experiments shown that combination give much improved results than individual case Text Categorization. Through this work also could prove it correct generating An additional overhead storage time compensated by efficient output. This add narrow light towards text document search education both teaching research.

参考文章(13)
Jamie Callan, Passage-retrieval evidence in document retrieval international acm sigir conference on research and development in information retrieval. ,(1994)
Man Lan, Sam-Yuan Sung, Hwee-Boon Low, Chew-Lim Tan, A comparative study on term weighting schemes for text categorization international joint conference on neural network. ,vol. 1, pp. 546- 551 ,(2005) , 10.1109/IJCNN.2005.1555890
Ran El-Yaniv, Yoad Winter, Naftali Tishby, Ron Bekkerman, Distributional word clusters vs. words for text categorization Journal of Machine Learning Research. ,vol. 3, pp. 1183- 1208 ,(2003)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Xiao-Bing Xue, Zhi-Hua Zhou, Distributional Features for Text Categorization IEEE Transactions on Knowledge and Data Engineering. ,vol. 21, pp. 428- 442 ,(2009) , 10.1109/TKDE.2008.166
Abdelmoula El-Hamdouchi, Peter Willett, An improved algorithm for the calculation of exact term discrimination values Information Processing and Management. ,vol. 24, pp. 17- 22 ,(1988) , 10.1016/0306-4573(88)90073-8
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0
Peter Willett, An algorithm for the calculation of exact term discrimination values Information Processing and Management. ,vol. 21, pp. 225- 232 ,(1985) , 10.1016/0306-4573(85)90107-4
Carolyn J. Crouch, An analysis of approximate versus exact discrimination values Information Processing and Management. ,vol. 24, pp. 5- 16 ,(1988) , 10.1016/0306-4573(88)90072-6
James P. Callan, Passage-level evidence in document retrieval international acm sigir conference on research and development in information retrieval. pp. 302- 310 ,(1994) , 10.5555/188490.188589