Deep Neural Networks for Czech Multi-label Document Classification

作者: Ladislav Lenc , Pavel Král

DOI: 10.1007/978-3-319-75487-1_36

关键词: Computer scienceSet (abstract data type)Machine learningCzechBaseline (configuration management)Artificial intelligenceDocument classificationPerceptronDeep neural networks

摘要: This paper is focused on automatic multi-label document classification of Czech text documents. The current approaches usually use some pre-processing which can have negative impact (loss information, additional implementation work, etc). Therefore, we would like to omit it and deep neural networks that learn from simple features. choice was motivated by their successful usage in many other machine learning fields. Two different are compared: the first one a standard multi-layer perceptron, while second popular convolutional network. experiments newspaper corpus show both significantly outperform baseline method uses rich set features with maximum entropy classifier. We also shown network gives best results.

参考文章(26)
David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)
Tomáš Brychcín, Pavel Král, Novel Unsupervised Features for Czech Multi-label Document Classification mexican international conference on artificial intelligence. pp. 70- 79 ,(2014) , 10.1007/978-3-319-13647-9_8
Michal Hrala, Pavel Král, Evaluation of the Document Classification Approaches computer recognition systems. pp. 877- 885 ,(2013) , 10.1007/978-3-319-00969-8_86
Michal Hrala, Pavel Král, Multi-label Document Classification in Czech text speech and dialogue. pp. 343- 351 ,(2013) , 10.1007/978-3-642-40585-3_44
R. Chandrasekar, B. Srinivas, Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging RIAO '97 Computer-Assisted Information Searching on Internet. pp. 531- 545 ,(1997)
Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)
Yann LeCun, Xiang Zhang, Text Understanding from Scratch arXiv: Learning. ,(2015)
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Daniel Ramage, David Hall, Ramesh Nallapati, Christopher D. Manning, Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora empirical methods in natural language processing. pp. 248- 256 ,(2009) , 10.3115/1699510.1699543
Jiali Yun, Liping Jing, Jian Yu, Houkuan Huang, A multi-layer text classification framework based on two-level representation model Expert Systems With Applications. ,vol. 39, pp. 2035- 2046 ,(2012) , 10.1016/J.ESWA.2011.08.027