Collaborative Data Cleaning for Sentiment Classification with Noisy Training Corpus

作者: Xiaojun Wan

DOI: 10.1007/978-3-642-20841-6_27

关键词:

摘要: Labeled review corpus is considered as a very valuable resource for the task of sentiment classification product reviews. Fortunately, there are large amount reviews on Web, and each associated with tag assigned by users to indicate its polarity orientation. We can download such tags use them training classification. However, may assign arbitrarily inaccurately, some not appropriate, which results in that automatically constructed contains many noises noisy instances will deteriorate performance. In this paper, we propose co-cleaning tri-cleaning algorithms collaboratively clean thus improve The proposed multiple classifiers iteratively select remove most confidently from corpus. Experimental verify effectiveness our algorithms, algorithm effective promising.

参考文章(37)
Nigel Collier, Tony Mullen, Sentiment Analysis using Support Vector Machines with Diverse Information Sources empirical methods in natural language processing. pp. 412- 418 ,(2004)
Advances in Information Retrieval Theory Lecture Notes in Computer Science. ,vol. 5766, ,(2009) , 10.1007/978-3-642-04417-5
Eleazar Eskin, Detecting errors within a corpus using anomaly detection north american chapter of the association for computational linguistics. pp. 148- 153 ,(2000)
Robert E. Schapire, Steven Abney, Yoram Singer, Boosting Applied to Tagging and PP Attachment empirical methods in natural language processing. ,(1999)
Andrea Esuli, Fabrizio Sebastiani, Training Data Cleaning for Text Classification international conference on the theory of information retrieval. pp. 29- 41 ,(2009) , 10.1007/978-3-642-04417-5_4
Jun Li, Maosong Sun, Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques international conference natural language processing. pp. 393- 400 ,(2007) , 10.1109/NLPKE.2007.4368061
Markus Dickinson, W. Detmar Meurers, Detecting errors in part-of-speech annotation conference of the european chapter of the association for computational linguistics. pp. 107- 114 ,(2003) , 10.3115/1067807.1067823
Masaki Murata, Masao Utiyama, Kiyotaka Uchimoto, Hitoshi Isahara, Qing Ma, Correction of errors in a verb modality corpus for machine translation with a machine-learning method ACM Transactions on Asian Language Information Processing (TALIP). ,vol. 4, pp. 18- 37 ,(2005) , 10.1145/1066078.1066080