作者: Ladislav Lenc , Pavel Král
DOI: 10.1007/978-3-319-75487-1_36
关键词: Computer science 、 Set (abstract data type) 、 Machine learning 、 Czech 、 Baseline (configuration management) 、 Artificial intelligence 、 Document classification 、 Perceptron 、 Deep neural networks
摘要: This paper is focused on automatic multi-label document classification of Czech text documents. The current approaches usually use some pre-processing which can have negative impact (loss information, additional implementation work, etc). Therefore, we would like to omit it and deep neural networks that learn from simple features. choice was motivated by their successful usage in many other machine learning fields. Two different are compared: the first one a standard multi-layer perceptron, while second popular convolutional network. experiments newspaper corpus show both significantly outperform baseline method uses rich set features with maximum entropy classifier. We also shown network gives best results.