作者: Segun Taofeek Aroyehun , Alexander F Gelbukh
DOI:
关键词:
摘要: This paper presents an evaluation of intermediate pretraining for the task of offensive language identification. We leverage recent advances in multilingual contextual representation and fine-tuning of pre-trained language models. We compare the performance of a pretrained language model adapted for the social media domain and another that was further trained on multilingual sentiment analysis data. We found that the intermediate pre-training steps prior to fine-tuning on the target task yield performance gains. The best submissions by our team, NLP-CIC, achieved first and second place on the non-contextual Spanish (Subtask 1) and Mexican Spanish (Subtask 3) subtasks of the MeOffendEs-IberLEF 2021 shared task respectively.