An Effective Feature Selection Method for Text Categorization

作者: Xipeng Qiu , Jinlong Zhou , Xuanjing Huang , None

DOI: 10.1007/978-3-642-20841-6_5

关键词:

摘要: Feature selection is an efficient strategy to reduce the dimensionality of data and removing noise in text categorization. However, most feature methods aim remove non-informative features based on corpus statistics, which do not relate classification accuracy directly. In this paper, we propose effective method, aims at KNN. Our experiments show that our method better than traditional methods, it also beneficial other classifiers, such as Support Vector Machines (SVM).

参考文章(23)
John Lafferty, Kamal Nigam, Andrew McCallum, Using Maximum Entropy for Text Classification ,(1999)
Ken Lang, NewsWeeder: Learning to Filter Netnews Machine Learning Proceedings 1995. pp. 331- 339 ,(1995) , 10.1016/B978-1-55860-377-6.50048-7
David G. Stork, Richard O. Duda, Peter E. Hart, Pattern Classification (2nd ed.) ,(1999)
Luigi Galavotti, Fabrizio Sebastiani, Maria Simi, Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization european conference on research and advanced technology for digital libraries. ,vol. 1923, pp. 59- 68 ,(2000) , 10.1007/3-540-45268-0_6
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
Yiming Yang, A study of thresholding strategies for text categorization international acm sigir conference on research and development in information retrieval. pp. 137- 145 ,(2001) , 10.1145/383952.383975
Yiming Yang, Xin Liu, A re-examination of text categorization methods international acm sigir conference on research and development in information retrieval. pp. 42- 49 ,(1999) , 10.1145/312624.312647
Xue-wen Chen, Michael Wasikowski, FAST Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 124- 132 ,(2008) , 10.1145/1401890.1401910
Ted Dunning, Accurate methods for the statistics of surprise and coincidence Computational Linguistics. ,vol. 19, pp. 61- 74 ,(1993)
Fabrizio Sebastiani, Machine learning in automated text categorization ACM Computing Surveys. ,vol. 34, pp. 1- 47 ,(2002) , 10.1145/505282.505283