作者: Khaled Shaalan , Mai Oudah
关键词: Named-entity recognition 、 Arabic 、 Artificial intelligence 、 Bottleneck 、 Phone 、 Computer science 、 Natural language processing 、 Information extraction 、 Hybrid approach 、 Decision tree 、 Support vector machine
摘要: In this paper, we propose a hybrid named entity recognition (NER) approach that takes the advantages of rule-based and machine learning-based approaches in order to improve overall system performance overcome knowledge elicitation bottleneck lack resources for underdeveloped languages require deep language processing, such as Arabic. The complexity Arabic poses special challenges researchers NER, which is essential both monolingual multilingual applications. We used develop an NER capable recognizing 11 types entities: Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN File Name. Extensive experiments were conducted using decision trees, Support Vector Machines logistic regression classifiers evaluate performance. empirical results indicate outperforms ML-based when they are processed independently. More importantly, our state-of-the-art terms accuracy applied ANERcorp standard dataset, with F-measures 0.94 0.90 Location 0.88 Organization.