作者: Rey-Long Liu
DOI: 10.1007/978-3-642-36543-0_4
关键词:
摘要: Automatic text classification (TC) is essential for the archiving and retrieval of texts, which are main ways recording information expertise. Previous studies thus have developed many classifiers. They often employed training texts to build classifiers, showed that classifiers had good performance in various application domains. However, as inevitably unsound or incomplete practice, they contain terms not related categories interest. Such actually noises classifier training, hence can deteriorate Reduction essential. It also quite challenging incomplete. In this paper, we develop a technique TNR ( T raining N oise R eduction) remove possible so be further improved. Given d category c, identifies sequence consecutive (in d) if strongly c. A case study on Chinese disease shows improve Support Vector Machine (SVM) classifier, state-of-the-art TC. The contribution significance enhancement existing