Deep Neural Networks for Czech Multi-label Document Classification

作者： Ladislav Lenc , Pavel Král

关键词: Computer science 、 Set (abstract data type) 、 Machine learning 、 Czech 、 Baseline (configuration management) 、 Artificial intelligence 、 Document classification 、 Perceptron 、 Deep neural networks

摘要: This paper is focused on automatic multi-label document classification of Czech text documents. The current approaches usually use some pre-processing which can have negative impact (loss information, additional implementation work, etc). Therefore, we would like to omit it and deep neural networks that learn from simple features. choice was motivated by their successful usage in many other machine learning fields. Two different are compared: the first one a standard multi-layer perceptron, while second popular convolutional network. experiments newspaper corpus show both significantly outperform baseline method uses rich set features with maximum entropy classifier. We also shown network gives best results.

arxiv.org 本地加速

arxiv.org PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(26)

David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)

Tomáš Brychcín, Pavel Král, Novel Unsupervised Features for Czech Multi-label Document Classification mexican international conference on artificial intelligence. pp. 70- 79 ,(2014) , 10.1007/978-3-319-13647-9_8

Michal Hrala, Pavel Král, Evaluation of the Document Classification Approaches computer recognition systems. pp. 877- 885 ,(2013) , 10.1007/978-3-319-00969-8_86

Michal Hrala, Pavel Král, Multi-label Document Classification in Czech text speech and dialogue. pp. 343- 351 ,(2013) , 10.1007/978-3-642-40585-3_44

R. Chandrasekar, B. Srinivas, Using syntactic information in document filtering: a comparative study of part-of-speech tagging and supertagging RIAO '97 Computer-Assisted Information Searching on Internet. pp. 531- 545 ,(1997)

Tomas Mikolov, Greg S. Corrado, Kai Chen, Jeffrey Dean, Efficient Estimation of Word Representations in Vector Space international conference on learning representations. ,(2013)

Yann LeCun, Xiang Zhang, Text Understanding from Scratch arXiv: Learning. ,(2015)

Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181

Daniel Ramage, David Hall, Ramesh Nallapati, Christopher D. Manning, Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora empirical methods in natural language processing. pp. 248- 256 ,(2009) , 10.3115/1699510.1699543

10.

Jiali Yun, Liping Jing, Jian Yu, Houkuan Huang, A multi-layer text classification framework based on two-level representation model Expert Systems With Applications. ,vol. 39, pp. 2035- 2046 ,(2012) , 10.1016/J.ESWA.2011.08.027

Deep Neural Networks for Czech Multi-label Document Classification

来源期刊

我的账户

Deep Neural Networks for Czech Multi-label Document Classification

来源期刊

相似文章 5

Czech Text Document Corpus v 2.0

Joint Binary Neural Network for Multi-label Learning with Applications to Emotion Classification

Extensive Experimental Evaluation of Self-Organizing Maps for Automatic Classification of a Multi-Class Multi-Label Corpus

Transfer Learning Approach for Identification of Malicious Domain Names.

Real-Time Resume Classification System Using LinkedIn Profile Descriptions

我的账户