作者: Sivaji Bandyopadhyay , Asif Ekbal
DOI:
关键词: Bengali 、 Voting 、 Weighted voting 、 Support vector machine 、 Machine learning 、 Pattern recognition 、 Performance improvement 、 Principle of maximum entropy 、 Conditional random field 、 Named-entity recognition 、 Computer science 、 Artificial intelligence
摘要: This paper reports how the appropriate unlabeled data, post-processing and voting can be effective to improve performance of a Named Entity Recognition (NER) system. The proposed method is based on combination following classifiers: Maximum Entropy (ME), Conditional Random Field (CRF) Support Vector Machine (SVM). training set consists approximately 272K wordforms. tested with Bengali. A semi-supervised learning technique has been developed that uses data during We have shown simply relying upon use large corpora for improvement not in itself sufficient. describe measures automatically select documents sentences from data. In addition, we used number techniques post-process output each models order performance. Finally, applied weighted approach combine models. Experimental results show effectiveness overall average recall, precision, f-score values 93.79%, 91.34%, 92.55%, respectively, which shows an 19.4% over least performing baseline ME system 15.19% best SVM Povzetek: Razvita je metoda za prepoznavanje imen, ki temelji na uteuenem glasovanju veˇ c klasifikatorjev.