A hybrid approach to Arabic named entity recognition

作者: Khaled Shaalan , Mai Oudah

DOI: 10.1177/0165551513502417

关键词: Named-entity recognitionArabicArtificial intelligenceBottleneckPhoneComputer scienceNatural language processingInformation extractionHybrid approachDecision treeSupport vector machine

摘要: In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advantages of rule-based and machine learning-based approaches in order to improve overall system performance overcome knowledge elicitation bottleneck lack resources for underdeveloped languages require deep language processing, such as Arabic. The complexity Arabic poses special challenges researchers NER, which is essential both monolingual multilingual applications. We used develop an NER capable recognizing 11 types entities: Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN File Name. Extensive experiments were conducted using decision trees, Support Vector Machines logistic regression classifiers evaluate performance. empirical results indicate outperforms ML-based when they are processed independently. More importantly, our state-of-the-art terms accuracy applied ANERcorp standard dataset, with F-measures 0.94 0.90 Location 0.88 Organization.

参考文章(53)
Cheng-Wei Lee, Cheng-Wei Shih, Tzong-Han Tsai, Shih-Hung Wu, Wen-Lian Hsu, Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model International Journal of Computational Linguistics & Chinese Language Processing, Volume 9, Number 1, February 2004: Special Issue on Selected Papers from ROCLING XV. ,vol. 9, pp. 65- 81 ,(2004) , 10.30019/IJCLCLP.200402.0004
Nizar Habash, Owen Rambow, Dayne Freitag, Benjamin Farber, Improving NER in Arabic Using a Morphological Tagger. language resources and evaluation. pp. 2509- 2514 ,(2008)
Kashif Riaz, Rule-Based Named Entity Recognition in Urdu meeting of the association for computational linguistics. pp. 126- 135 ,(2010)
Choong-Nyoung Seon, Youngjoong Ko, Jungyun Seo, Jeong-Seok Kim, Named Entity Recognition using Machine Learning Methods and Pattern-Selection Rules. NLPRS. pp. 229- 236 ,(2001)
Yassine Benajiba, Mona T. Diab, Paolo Rosso, Using Language Independent and Language Specific Features to Enhance Arabic Named Entity Recognition The International Arab Journal of Information Technology. ,vol. 6, pp. 463- 471 ,(2009)
Yorick Wilks, Valentin Tablan, Diana Maynard, Cristian Ursu, Named Entity Recognition from Diverse Text Types ,(2001)
Nizar Habash, Abdelhadi Soudi, Timothy Buckwalter, On Arabic Transliteration Springer, Dordrecht. pp. 15- 22 ,(2007) , 10.1007/978-1-4020-6046-5_2
Yassine Benajiba, Paolo Rosso, ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information indian international conference on artificial intelligence. pp. 1814- 1823 ,(2007)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)