Semantic Space Transformations for Cross-Lingual Document Classification

作者: Jiří Martínek , Ladislav Lenc , Pavel Král

DOI: 10.1007/978-3-030-01418-6_60

关键词:

摘要: Cross-lingual document representation can be done by training monolingual semantic spaces and then to use bilingual dictionaries with some transform method project word vectors into a unified space. The main goal of this paper consists in evaluation three promising methods on cross-lingual classification task. We also propose, evaluate compare two approaches. popular convolutional neural network (CNN) its performance standard maximum entropy classifier. proposed are evaluated four languages, namely English, German, Spanish Italian from the Reuters corpus. demonstrate that results all transformation close each other, however orthogonal gives generally slightly better when CNN trained embeddings is used. experimental show achieves than further competitive state art.

参考文章(14)
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky, Ruslan Salakhutdinov, Nitish Srivastava, Dropout: a simple way to prevent neural networks from overfitting Journal of Machine Learning Research. ,vol. 15, pp. 1929- 1958 ,(2014)
Vincent J. Della Pietra, Adam L. Berger, Stephen A. Della Pietra, A maximum entropy approach to natural language processing Computational Linguistics. ,vol. 22, pp. 39- 71 ,(1996) , 10.5555/234285.234289
Richard Socher, Will Y. Zou, Christopher D. Manning, Daniel Cer, Bilingual Word Embeddings for Phrase-Based Machine Translation empirical methods in natural language processing. pp. 1393- 1398 ,(2013)
Tomáš Kočiský, Karl Moritz Hermann, Phil Blunsom, Learning Bilingual Word Representations by Marginalizing Alignments Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 224- 229 ,(2014) , 10.3115/V1/P14-2037
Yiming Yang, Fan Li, David D. Lewis, Tony G. Rose, RCV1: A New Benchmark Collection for Text Categorization Research Journal of Machine Learning Research. ,vol. 5, pp. 361- 397 ,(2004) , 10.5555/1005332.1005345
Ivan Titov, Alexandre Klementiev, Binod Bhattarai, Inducing Crosslingual Distributed Representations of Words international conference on computational linguistics. pp. 1459- 1474 ,(2012)
Omer Levy, Yoav Goldberg, Dependency-Based Word Embeddings Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). pp. 302- 308 ,(2014) , 10.3115/V1/P14-2050
Pavel Král, Ladislav Lenc, Deep Neural Networks for Czech Multi-label Document Classification. conference on intelligent text processing and computational linguistics. pp. 460- 471 ,(2016) , 10.1007/978-3-319-75487-1_36
Tomáš Brychcín, Linear Transformations for Cross-lingual Semantic Textual Similarity. arXiv: Computation and Language. ,(2018)