A General Framework of Feature Selection for Text Categorization

作者： Hongfang Jing , Bin Wang , Yahui Yang , Yan Xu

关键词: dBFS 、 Pattern recognition 、 Text categorization 、 Feature selection 、 Artificial intelligence 、 Machine learning 、 Computer science 、 Information gain

摘要: Many feature selection methods have been proposed for text categorization. However, their performances are usually verified by experiments, so the results rely on corpora used and may not be accurate. This paper proposes a novel framework called Distribution-Based Feature Selection (DBFS) based distribution difference of features. generalizes most state-of-the-art including OCFS, MI, ECE, IG, CHI OR. The many can estimated theoretical analysis using components this framework. Besides, DBFS sheds light merits drawbacks existing methods. In addition, helps to select suitable specific domains. Moreover, weighted model is given that unbalanced datasets derived. experimental show they more effective than CHI, IG OCFS both balanced datasets.

参考文章(32)

Gaëlle Legrand, Nicolas Nicoloyannis, Feature Selection Method Using Preferences Aggregation Machine Learning and Data Mining in Pattern Recognition. pp. 203- 217 ,(2005) , 10.1007/11510888_21

Marko Robnik-Šikonja, Igor Kononenko, Theoretical and Empirical Analysis of ReliefF and RReliefF Machine Learning. ,vol. 53, pp. 23- 69 ,(2003) , 10.1023/A:1025667309714

Shusaku Tsumoto, Lech Polkowski, Tsau Young Lin, Rough set methods and applications: new developments in knowledge discovery in information systems Physica-Verlag GmbH. ,(2000)

Marko Grobelnik, Dunja Mladenic, Feature Selection for Unbalanced Class Distribution and Naive Bayes international conference on machine learning. pp. 258- 267 ,(1999)

Pat Langley, Selection of Relevant Features in Machine Learning national conference on artificial intelligence. pp. 1- 5 ,(1994) , 10.21236/ADA292575

George H John, Ron Kohavi, Karl Pfleger, None, Irrelevant Features and the Subset Selection Problem Machine Learning Proceedings 1994. pp. 121- 129 ,(1994) , 10.1016/B978-1-55860-335-6.50023-4

A. Salappa, M. Doumpos, C. Zopounidis, Feature selection algorithms in classification problems: an experimental evaluation Optimization Methods & Software. ,vol. 22, pp. 199- 212 ,(2007) , 10.1080/10556780600881910

Yiming Yang, Xin Liu, A re-examination of text categorization methods international acm sigir conference on research and development in information retrieval. pp. 42- 49 ,(1999) , 10.1145/312624.312647

Man-Wai Mak, Sun-Yuan Kung, Fusion of feature selection methods for pairwise scoring SVM Neurocomputing. ,vol. 71, pp. 3104- 3113 ,(2008) , 10.1016/J.NEUCOM.2008.04.024

10.

Zhaohui Zheng, Xiaoyun Wu, Rohini Srihari, Feature selection for text categorization on imbalanced data ACM SIGKDD Explorations Newsletter. ,vol. 6, pp. 80- 89 ,(2004) , 10.1145/1007730.1007741

A General Framework of Feature Selection for Text Categorization

来源期刊

我的账户

A General Framework of Feature Selection for Text Categorization

来源期刊

相似文章 0

我的账户