Multilingual Probing of Deep Pre-Trained Contextual Encoders

作者: Erik Velldal , Lilja Øvrelid , Vinit Ravishankar , Memduh Gökırmak

DOI:

关键词:

摘要: Encoders that generate representations based on context have, in recent years, benefited from adaptations allow for pre-training large text corpora. Earlier work evaluating fixed-length sentence has included the use of ‘probing’ tasks, diagnostic classifiers to attempt quantify extent which these encoders capture specific linguistic phenomena. The principle probing also resulted extended evaluations include relatively newer word-level pre-trained encoders. We build tasks established literature and comprehensively evaluate analyse – a typological perspective amongst others multilingual variants existing datasets constructed 6 non-English languages. Specifically, we probe each layer multiple monolingual RNN-based ELMo models, transformer-based BERT’s cased uncased variants, variant BERT uses cross-lingual modelling scheme (XLM).

参考文章(37)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Sébastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio, On Using Very Large Target Vocabulary for Neural Machine Translation Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). ,vol. 1, pp. 1- 10 ,(2015) , 10.3115/V1/P15-1001
Tibor Kiss, Jan Strunk, Unsupervised Multilingual Sentence Boundary Detection Computational Linguistics. ,vol. 32, pp. 485- 525 ,(2006) , 10.1162/COLI.2006.32.4.485
Philipp Koehn, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, Evan Herbst, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Moses: Open Source Toolkit for Statistical Machine Translation meeting of the association for computational linguistics. pp. 177- 180 ,(2007) , 10.3115/1557769.1557821
Ilya Sutskever, Tomas Mikolov, Greg S Corrado, Kai Chen, Jeff Dean, Distributed Representations of Words and Phrases and their Compositionality neural information processing systems. ,vol. 26, pp. 3111- 3119 ,(2013)
Wilson L. Taylor, “Cloze Procedure”: A New Tool for Measuring Readability Journalism Quarterly. ,vol. 30, pp. 415- 433 ,(1953) , 10.1177/107769905303000401
Xing Shi, Inkit Padhi, Kevin Knight, Does String-Based Neural MT Learn Source Syntax? empirical methods in natural language processing. pp. 1526- 1534 ,(2016) , 10.18653/V1/D16-1159
Lasha Abzianidze, Johannes Bjerva, Kilian Evang, Hessel Haagsma, Rik van Noord, Pierre Ludmann, Duc-Duy Nguyen, Johan Bos, The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations conference of the european chapter of the association for computational linguistics. pp. 242- 247 ,(2017) , 10.18653/V1/E17-2039
Erik Velldal, Andrey Kutuzov, Stephan Oepen, Murhaf Fares, Word vectors, reuse, and replicability: Towards a community repository of large-text resources Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden. pp. 271- 276 ,(2017)
Milan Straka, Jana Straková, Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies : August 3-4, 2017 Vancouver, Canada, 2017, ISBN 978-1-945626-70-8, págs. 88-99. pp. 88- 99 ,(2017) , 10.18653/V1/K17-3009