Multi-label Document Classification in Czech

作者: Michal Hrala , Pavel Král

DOI: 10.1007/978-3-642-40585-3_44

关键词:

摘要: This paper deals with multi-label automatic document classification in the context of a real application for Czech news agency. The main goal this work is to compare and evaluate three most promising approaches on language. We show that simple method based meta-classifier proposes by Zhu at al. outperforms significantly other approaches. error rate improvement about 13%. corpus available research purposes free which another contribution work.

参考文章(16)
Michal Hrala, Pavel Král, Evaluation of the Document Classification Approaches computer recognition systems. pp. 877- 885 ,(2013) , 10.1007/978-3-319-00969-8_86
Luigi Galavotti, Fabrizio Sebastiani, Maria Simi, Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization european conference on research and advanced technology for digital libraries. ,vol. 1923, pp. 59- 68 ,(2000) , 10.1007/3-540-45268-0_6
Jana Novovičová, Antonín Malík, Pavel Pudil, Feature Selection Using Improved Mutual Information for Text Classification Lecture Notes in Computer Science. pp. 1010- 1017 ,(2004) , 10.1007/978-3-540-27868-9_111
Jana Novovičová, Petr Somol, Michal Haindl, Pavel Pudil, Conditional mutual information based feature selection for classification task iberoamerican congress on pattern recognition. pp. 417- 426 ,(2007) , 10.1007/978-3-540-76725-1_44
Jiali Yun, Liping Jing, Jian Yu, Houkuan Huang, A multi-layer text classification framework based on two-level representation model Expert Systems With Applications. ,vol. 39, pp. 2035- 2046 ,(2012) , 10.1016/J.ESWA.2011.08.027
Shenghuo Zhu, Xiang Ji, Wei Xu, Yihong Gong, Multi-labelled classification using maximum entropy method international acm sigir conference on research and development in information retrieval. pp. 274- 281 ,(2005) , 10.1145/1076034.1076082
Juan Carlos Gomez, Marie-Francine Moens, PCA document reconstruction for email classification Computational Statistics & Data Analysis. ,vol. 56, pp. 741- 751 ,(2012) , 10.1016/J.CSDA.2011.09.023
Andrej Bratko, Bogdan Filipič, Exploiting structural information for semi-structured document categorization Information Processing and Management. ,vol. 42, pp. 679- 694 ,(2006) , 10.1016/J.IPM.2005.06.003
Chul Su Lim, Kong Joo Lee, Gil Chang Kim, Multiple sets of features for automatic genre classification of web documents Information Processing and Management. ,vol. 41, pp. 1263- 1276 ,(2005) , 10.1016/J.IPM.2004.06.004
J. Scott Olsson, Douglas W. Oard, Jan Hajič, Cross-language text classification international acm sigir conference on research and development in information retrieval. pp. 645- 646 ,(2005) , 10.1145/1076034.1076170