Named Entity Recognition Using Appropriate Unlabeled Data, Post-processing and Voting

作者: Sivaji Bandyopadhyay , Asif Ekbal

DOI:

关键词: BengaliVotingWeighted votingSupport vector machineMachine learningPattern recognitionPerformance improvementPrinciple of maximum entropyConditional random fieldNamed-entity recognitionComputer scienceArtificial intelligence

摘要: This paper reports how the appropriate unlabeled data, post-processing and voting can be effective to improve performance of a Named Entity Recognition (NER) system. The proposed method is based on combination following classifiers: Maximum Entropy (ME), Conditional Random Field (CRF) Support Vector Machine (SVM). training set consists approximately 272K wordforms. tested with Bengali. A semi-supervised learning technique has been developed that uses data during We have shown simply relying upon use large corpora for improvement not in itself sufficient. describe measures automatically select documents sentences from data. In addition, we used number techniques post-process output each models order performance. Finally, applied weighted approach combine models. Experimental results show effectiveness overall average recall, precision, f-score values 93.79%, 91.34%, 92.55%, respectively, which shows an 19.4% over least performing baseline ME system 15.19% best SVM Povzetek: Razvita je metoda za prepoznavanje imen, ki temelji na uteuenem glasovanju veˇ c klasifikatorjev.

参考文章(55)
P Srikanth, Kavi Narayana Murthy, Named Entity Recognition for Telugu international joint conference on natural language processing. pp. 41- 50 ,(2008)
Claire Grover, Marc Moens, Andrei Mikheev, Description of the LTG system used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Chinatsu Aone, Mila Ramos-Santacruz, Lauren Halverson, Tom Hampton, SRA: Description of the IE2 System Used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Orest Bolohan, Adriana Badulescu, Roxana Girju, Paul Morarescu, Adrian Novischi, Dan I. Moldovan, Sanda M. Harabagiu, V. Finley Lacatusu, LCC Tools for Question Answering. text retrieval conference. ,(2002)
Dimitra Farmakiotou, John Koutsias, Panagiotis Stamatopoulos, Vangelis Karkaletsis, Constantine D. Spyropoulos, George Sigletos, RULE-BASED NAMED ENTITY RECOGNITION FOR GREEK FINANCIAL TEXTS ,(2000)
Sivaji Bandyopadhyay, Asif Ekbal, Bengali Named Entity Recognition Using Support Vector Machine international joint conference on natural language processing. pp. 51- 58 ,(2008)
Satoshi Sekine, Description of the Japanese NE System Used for MET-2 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Michael Crystal, Lance Ramshaw, Richard Schwartz, Heidi Fox, Rebecca Stone, Scott Miller, Ralph Weischedel, BBN: Description of the SIFT System as Used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Ellen Riloff, Rosie Jones, Learning dictionaries for information extraction by multi-level bootstrapping national conference on artificial intelligence. pp. 474- 479 ,(1999)
Ralph Grishman, Andrew Eliot Borthwick, A maximum entropy approach to named entity recognition Ph. D. Thesis New York University. ,(1999)