Mencius: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model

作者: Cheng-Wei Lee , Cheng-Wei Shih , Tzong-Han Tsai , Shih-Hung Wu , Wen-Lian Hsu

DOI: 10.30019/IJCLCLP.200402.0004

关键词:

摘要: This paper presents a Chinese named entity recognizer (NER): Mencius. It aims to address NER problems by combining the advantages of rule-based and machine learning (ML) based systems. Rule-based systems can explicitly encode human comprehension be tuned conveniently, while ML-based are robust, portable inexpensive develop. Our hybrid system incorporates knowledge representation template-matching tool, called InfoMap [Wu et al. 2002], into maximum entropy (ME) framework. Named entities represented in as templates, which serve ME features These edited manually, their weights estimated framework according training data. To understand how word segmentation might influence differences between pure template-based method our method, we configure Mencius using four distinct settings. The F-Measures person names (PER), location (LOC) organization (ORO) best configuration experiment were respectively 94.3%, 77.8% 75.3%. From comparing results obtained these configurations reveals that Systems always perform better performance identifying names. On other hand, they have little difficulty Furthermore, module improves Template-based Systems, but, it has effect on

参考文章(16)
Claire Grover, Marc Moens, Andrei Mikheev, Description of the LTG system used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Hsin-Hsi Chen, Shih-Chung Tsai, Guo-Wei Bian, Yung-Wei Ding, Description of the NTU System used for MET-2. Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Paul Wu, Shihong Yu, Shuanhu Bai, Description of the Kent Ridge Digital Labs System Used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Michael Crystal, Lance Ramshaw, Richard Schwartz, Heidi Fox, Rebecca Stone, Scott Miller, Ralph Weischedel, BBN: Description of the SIFT System as Used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Ralph Grishman, Andrew Eliot Borthwick, A maximum entropy approach to named entity recognition Ph. D. Thesis New York University. ,(1999)
Ralph Grishman, Information Extraction: Techniques and Challenges Lecture Notes in Computer Science. pp. 10- 27 ,(1997) , 10.1007/3-540-63438-X_2
Daniel M. Bikel, Richard Schwartz, Ralph M. Weischedel, An Algorithm that Learns What‘s in a Name Machine Learning. ,vol. 34, pp. 211- 231 ,(1999) , 10.1023/A:1007558221122
Ralph Grishman, Andrew Borthwick, Eugene Agichtein, John Sterling, NYU: Description of the MENE Named Entity System as Used in MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Shih-Hung Wu, Min-Yuh Day, Tzong-Han Tsai, Wen-Lian Hsu, FAQ-Centered Organizational Memory international joint conference on artificial intelligence. pp. 103- 112 ,(2002) , 10.1007/978-1-4615-0947-9_9
David D. McDonald, Internal and external evidence in the identification and semantic categorization of proper names Corpus processing for lexical acquisition. pp. 21- 39 ,(1996)