作者: Hongfang Jing , Bin Wang , Yahui Yang , Yan Xu
DOI: 10.1007/978-3-642-03070-3_49
关键词: dBFS 、 Pattern recognition 、 Text categorization 、 Feature selection 、 Artificial intelligence 、 Machine learning 、 Computer science 、 Information gain
摘要: Many feature selection methods have been proposed for text categorization. However, their performances are usually verified by experiments, so the results rely on corpora used and may not be accurate. This paper proposes a novel framework called Distribution-Based Feature Selection (DBFS) based distribution difference of features. generalizes most state-of-the-art including OCFS, MI, ECE, IG, CHI OR. The many can estimated theoretical analysis using components this framework. Besides, DBFS sheds light merits drawbacks existing methods. In addition, helps to select suitable specific domains. Moreover, weighted model is given that unbalanced datasets derived. experimental show they more effective than CHI, IG OCFS both balanced datasets.