The Use of Unlabeled Data Versus Labeled Data for Stopping Active Learning for Text Classification

作者: Garrett Beatty , Ethan Kochis , Michael Bloodgood

DOI: 10.1109/ICOSC.2019.8665546

关键词:

摘要: Annotation of training data is the major bottleneck in creation text classification systems. Active learning a commonly used technique to reduce amount one needs label. A crucial aspect active determining when stop labeling data. Three potential sources for informing are an additional labeled set data, unlabeled and that during process learning. To date, no has compared contrasted advantages disadvantages stopping methods based on these three information sources. We find use more effective than

参考文章(32)
Nello Cristianini, Alex J. Smola, Colin Campbell, Query Learning with Large Margin Classifiers international conference on machine learning. pp. 111- 118 ,(2000)
David Cohn, Greg Schohn, Less is More: Active Learning with Support Vector Machines international conference on machine learning. pp. 839- 846 ,(2000)
Kamal Nigam, Andrew McCallum, A comparison of event models for naive bayes text classification national conference on artificial intelligence. pp. 41- 48 ,(1998)
Sotiris B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques Informatica (lithuanian Academy of Sciences). ,vol. 31, pp. 249- 268 ,(2007)
Florian Laws, Hinrich Schätze, Stopping Criteria for Active Learning of Named Entity Recognition international conference on computational linguistics. pp. 465- 472 ,(2008) , 10.3115/1599081.1599140
Steven C. H. Hoi, Rong Jin, Michael R. Lyu, Large-scale text categorization by batch mode active learning Proceedings of the 15th international conference on World Wide Web - WWW '06. pp. 633- 642 ,(2006) , 10.1145/1135777.1135870
Michael Bloodgood, K. Vijay-Shanker, Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets north american chapter of the association for computational linguistics. pp. 137- 140 ,(2009) , 10.13016/M2QP4K
Monisha Kanakaraj, Ram Mohana Reddy Guddeti, Performance analysis of Ensemble methods on Twitter sentiment analysis using NLP techniques ieee international conference semantic computing. pp. 169- 170 ,(2015) , 10.1109/ICOSC.2015.7050801
Maciej Janik, Krys J. Kochut, Wikipedia in Action: Ontological Knowledge in Text Categorization ieee international conference semantic computing. pp. 268- 275 ,(2008) , 10.1109/ICSC.2008.53