Classification of Small Datasets: Why Using Class-Based Weighting Measures?

作者: Flavien Bouillot , Pascal Poncelet , Mathieu Roche

DOI: 10.1007/978-3-319-08326-1_35

关键词: Naive Bayes classifierClassifier (UML)Classification methodsArtificial intelligenceWeightingtf–idfPattern recognitionComputer scienceSmall number

摘要: In text classification, providing an efficient classifier even if the number of documents involved in learning step is small remains important issue. this paper we evaluate performance traditional classification methods to better their limitation phase when dealing with amount documents. We thus propose a new way for weighting features which are used classifying. These have been integrated two well known classifiers: Class-Feature-Centroid and Naive Bayes, evaluations performed on real datasets. also investigated influence parameters such as classes, or words classification. Experiments shown efficiency our proposal relatively state art methods. Either very few data that can be extracted from poor content documents, show approach performs well.

参考文章(19)
George Forman, Ira Cohen, Learning from little: comparison of classifiers given little training european conference on principles of data mining and knowledge discovery. pp. 161- 172 ,(2004) , 10.1007/978-3-540-30116-5_17
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
Kamal Nigam, Andrew McCallum, A comparison of event models for naive bayes text classification national conference on artificial intelligence. pp. 41- 48 ,(1998)
Geoffrey Holmes, Bernhard Pfahringer, Richard Kirkby, Eibe Frank, Mark Hall, Multiclass Alternating Decision Trees Lecture Notes in Computer Science. pp. 161- 172 ,(2002) , 10.1007/3-540-36755-1_14
Thorsten Joachims, A Statistical Learning Model of Text Classification for Support Vector Machines. international acm sigir conference on research and development in information retrieval. pp. 128- 136 ,(2001)
George H. John, Pat Langley, Estimating continuous distributions in Bayesian classifiers uncertainty in artificial intelligence. pp. 338- 345 ,(1995)
David D. Lewis, Naive (Bayes) at forty: The independence assumption in information retrieval Machine Learning: ECML-98. pp. 4- 15 ,(1998) , 10.1007/BFB0026666
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Thorsten Joachims, A statistical learning learning model of text classification for support vector machines international acm sigir conference on research and development in information retrieval. pp. 128- 136 ,(2001) , 10.1145/383952.383974
Jiang Su, Harry Zhang, Charles X. Ling, Stan Matwin, Discriminative parameter learning for Bayesian networks Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 1016- 1023 ,(2008) , 10.1145/1390156.1390284