Neural Networks for Multi-lingual Multi-label Document Classification

作者: Jiří Martínek , Ladislav Lenc , Pavel Král

DOI: 10.1007/978-3-030-01418-6_8

关键词:

摘要: This paper proposes a novel approach for multi-lingual multi-label document classification based on neural networks. We use popular convolutional networks this task with three different configurations. The first one uses static word2vec embeddings that are let as is, while the second initializes it and fine-tunes learning available data. last method randomly then they optimized to task. proposed is evaluated four languages, namely English, German, Spanish Italian from Reuters corpus. Experimental results show efficient best obtained F-measure reaches 84%.

参考文章(17)
David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)
Jinseok Nam, Jungi Kim, Eneldo Loza Mencía, Iryna Gurevych, Johannes Fürnkranz, Large-scale multi-label text classification — revisiting neural networks european conference on machine learning. pp. 437- 452 ,(2014) , 10.1007/978-3-662-44851-9_28
Geoffrey E. Hinton, Vinod Nair, Rectified Linear Units Improve Restricted Boltzmann Machines international conference on machine learning. pp. 807- 814 ,(2010)
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky, Ruslan Salakhutdinov, Nitish Srivastava, Dropout: a simple way to prevent neural networks from overfitting Journal of Machine Learning Research. ,vol. 15, pp. 1929- 1958 ,(2014)
Richard Socher, Will Y. Zou, Christopher D. Manning, Daniel Cer, Bilingual Word Embeddings for Phrase-Based Machine Translation empirical methods in natural language processing. pp. 1393- 1398 ,(2013)
Min-Ling Zhang, Zhi-Hua Zhou, Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization IEEE Transactions on Knowledge and Data Engineering. ,vol. 18, pp. 1338- 1351 ,(2006) , 10.1109/TKDE.2006.162
Tomas Mikolov, Quoc Le, Distributed Representations of Sentences and Documents international conference on machine learning. ,vol. 4, pp. 1188- 1196 ,(2014)
Tomáš Kočiský, Karl Moritz Hermann, Phil Blunsom, Learning Bilingual Word Representations by Marginalizing Alignments Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 224- 229 ,(2014) , 10.3115/V1/P14-2037
Yiming Yang, Fan Li, David D. Lewis, Tony G. Rose, RCV1: A New Benchmark Collection for Text Categorization Research Journal of Machine Learning Research. ,vol. 5, pp. 361- 397 ,(2004) , 10.5555/1005332.1005345