Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

作者： Dengya Zhu , Kok Wai Wong

关键词: Benchmark (computing) 、 Set (abstract data type) 、 Feature (machine learning) 、 Computer science 、 Boosting methods for object categorization 、 Naive Bayes classifier 、 Text categorization 、 Feature selection 、 AdaBoost 、 Machine learning 、 Artificial intelligence

摘要: Naive Bayes(NB), kNN and Adaboost are three commonly used text classifiers. Evaluation of these classifiers involves a variety factors to be considered including benchmark used, feature selections, parameter settings algorithms, the measurement criteria employed. Researchers have demonstrated that some algorithms outperform others on corpus, however, labeling corpus bias two concerns in categorization. This paper focuses evaluating by using an automatically generated document set which is labelled group experts alleviate subjectiveness labelling, at same time examine how performance influenced selection number features selected.

参考文章(13)

Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)

Charu C. Aggarwal, ChengXiang Zhai, A survey of text classification algorithms Mining Text Data. pp. 163- 222 ,(2012) , 10.1007/978-1-4614-3223-4_6

Robert E. Schapire, Yoram Singer, Amit Singhal, Boosting and Rocchio applied to text filtering international acm sigir conference on research and development in information retrieval. pp. 215- 223 ,(1998) , 10.1145/290941.290996

Robert E. Schapire, Yoram Singer, BoosTexter: A Boosting-based Systemfor Text Categorization Machine Learning. ,vol. 39, pp. 135- 168 ,(2000) , 10.1023/A:1007649029923

David Hickam, William Hersh, Chris Buckley, T. J. Leone, OHSUMED: an interactive retrieval evaluation and new large test collection for research international acm sigir conference on research and development in information retrieval. pp. 192- 201 ,(1994) , 10.5555/188490.188557

Dmitry Davidov, Evgeniy Gabrilovich, Shaul Markovitch, Parameterized generation of labeled datasets for text categorization based on a hierarchical directory Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04. pp. 250- 257 ,(2004) , 10.1145/1008992.1009036

Yiming Yang, An Evaluation of Statistical Approaches to Text Categorization Information Retrieval. ,vol. 1, pp. 69- 90 ,(1999) , 10.1023/A:1009982220290

Fabrizio Sebastiani, Machine learning in automated text categorization ACM Computing Surveys. ,vol. 34, pp. 1- 47 ,(2002) , 10.1145/505282.505283

Dengya Zhu, Heinz Dreher, Characteristics and Uses of Labeled Datasets - ODP Case Study semantics, knowledge and grid. pp. 227- 234 ,(2010) , 10.1109/SKG.2010.84

10.

Yoav Freund, Robert Schapire, Naoki Abe, A Short Introduction to Boosting ,(1999)

Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

来源期刊

我的账户

Text Categorization Using an Automatically Generated Labelled Dataset: An Evaluation Study

来源期刊

相似文章 0

我的账户