Complex Word Identification: Convolutional Neural Network vs. Feature Engineering

作者: Segun Taofeek Aroyehun , Jason Angel , Daniel Alejandro Pérez Alvarez , Alexander Gelbukh

DOI: 10.18653/V1/W18-0538

关键词:

摘要: We describe the systems of NLP-CIC team that participated in Complex Word Identification (CWI) 2018 shared task. The task aimed to benchmark approaches for identifying complex words English and other languages from perspective non-native speakers. Our goal is compare two approaches: feature engineering a deep neural network. Both achieved comparable performance on test set. demonstrated flexibility deep-learning approach by using same network setup Spanish track. competitive results: all our were within 0.01 system with best macro-F1 score sets except Wikipedia set, which 0.04 below score.

参考文章(18)
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Aliaksei Severyn, Alessandro Moschitti, Twitter Sentiment Analysis with Deep Convolutional Neural Networks international acm sigir conference on research and development in information retrieval. pp. 959- 962 ,(2015) , 10.1145/2766462.2767830
Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky, Ruslan Salakhutdinov, Nitish Srivastava, Dropout: a simple way to prevent neural networks from overfitting Journal of Machine Learning Research. ,vol. 15, pp. 1929- 1958 ,(2014)
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay, Scikit-learn: Machine Learning in Python Journal of Machine Learning Research. ,vol. 12, pp. 2825- 2830 ,(2011)
Wenpeng Yin, Hinrich Schütze, MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 63- 73 ,(2015) , 10.3115/V1/P15-1007
Wenpeng Yin, Hinrich Schütze, Convolutional Neural Network for Paraphrase Identification north american chapter of the association for computational linguistics. pp. 901- 911 ,(2015) , 10.3115/V1/N15-1091
Gillin Nat, Sensible at SemEval-2016 Task 11: Neural Nonsense Mangled in Ensemble Mess. north american chapter of the association for computational linguistics. pp. 963- 968 ,(2016) , 10.18653/V1/S16-1148
Gustavo Paetzold, Lucia Specia, None, Inferring Psycholinguistic Properties of Words north american chapter of the association for computational linguistics. pp. 435- 440 ,(2016) , 10.18653/V1/N16-1050
Michal Konkol, UWB at SemEval-2016 Task 11: Exploring Features for Complex Word Identification. north american chapter of the association for computational linguistics. pp. 1038- 1041 ,(2016) , 10.18653/V1/S16-1162
Gustavo Paetzold, Lucia Specia, SV000gg at SemEval-2016 Task 11: Heavy Gauge Complex Word Identification with System Voting north american chapter of the association for computational linguistics. pp. 969- 974 ,(2016) , 10.18653/V1/S16-1149