Reduction of Training Noises for Text Classifiers

作者: Rey-Long Liu

DOI: 10.1007/978-3-642-36543-0_4

关键词:

摘要: Automatic text classification (TC) is essential for the archiving and retrieval of texts, which are main ways recording information expertise. Previous studies thus have developed many classifiers. They often employed training texts to build classifiers, showed that classifiers had good performance in various application domains. However, as inevitably unsound or incomplete practice, they contain terms not related categories interest. Such actually noises classifier training, hence can deteriorate Reduction essential. It also quite challenging incomplete. In this paper, we develop a technique TNR ( T raining N oise R eduction) remove possible so be further improved. Given d category c, identifies sequence consecutive (in d) if strongly c. A case study on Chinese disease shows improve Support Vector Machine (SVM) classifier, state-of-the-art TC. The contribution significance enhancement existing

参考文章(14)
Fuchun Peng, Dale Schuurmans, Combining Naive Bayes and n-Gram Language Models for Text Classification Lecture Notes in Computer Science. pp. 335- 350 ,(2003) , 10.1007/3-540-36618-0_24
Thorsten Joachims, Making large scale SVM learning practical Technical reports. ,(1999) , 10.17877/DE290R-14262
BSCH OLKOPF, C Burges, A Smola, Advances in kernel methods: support vector learning international conference on neural information processing. ,(1999) , 10.5555/299094
Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen, Developing a Test Collection for the Evaluation of Integrated Search Lecture Notes in Computer Science. pp. 627- 630 ,(2010) , 10.1007/978-3-642-12275-0_63
Jinglei Zhao, Yeogirl Yun, A proximity language model for information retrieval international acm sigir conference on research and development in information retrieval. pp. 291- 298 ,(2009) , 10.1145/1571941.1571993
Chien Chin Chen, Meng Chang Chen, TSCAN Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. pp. 579- 586 ,(2008) , 10.1145/1390334.1390433
Jinsuk Kim, Myoung Ho Kim, An Evaluation of Passage-Based Text Categorization intelligent information systems. ,vol. 23, pp. 47- 65 ,(2004) , 10.1023/B:JIIS.0000029670.53363.D0
Shima Gerani, Mark James Carman, Fabio Crestani, Proximity-based opinion retrieval international acm sigir conference on research and development in information retrieval. pp. 403- 410 ,(2010) , 10.1145/1835449.1835517
Nasreen Abdul-Jaleel, Trevor Strohman, Leah Larkey, Mark D. Smucker, James Allan, Howard Turtle, W. Bruce Croft, Donald Metzler, Fernando Diaz, Courtney Wade, Xiaoyan Li, UMass at TREC 2004: Notebook ,(2004)
Dunja Mladenić, Janez Brank, Marko Grobelnik, Natasa Milic-Frayling, Feature selection using linear classifier weights Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04. pp. 234- 241 ,(2004) , 10.1145/1008992.1009034