Named entity recognition with multiple segment representations

作者: Han-Cheol Cho , Naoaki Okazaki , Makoto Miwa , Jun’ichi Tsujii

DOI: 10.1016/J.IPM.2013.03.002

关键词: Named-entity recognitionStability (learning theory)Conditional random fieldDiscriminative modelComputer scienceCRFSNatural language processingFeature engineeringSequence labelingFeature (machine learning)Pattern recognitionArtificial intelligence

摘要: Named entity recognition (NER) is mostly formalized as a sequence labeling problem in which segments of named entities are represented by label sequences. Although considerable effort has been made to investigate sophisticated features that encode textual characteristics (e.g. PEOPLE, LOCATION, etc.), little attention paid segment representations (SRs) for multi-token the IOB2 notation). In this paper, we effects different SRs on NER tasks, and propose feature generation method using multiple SRs. The proposed allows model exploit not only highly discriminative complex but also robust simple against data sparseness problem. Since it incorporates functions Conditional Random Fields (CRFs), can use well-established procedure training. addition, tagging speed integrating be accelerated equivalent most SR integrated model. Experimental results demonstrate incorporating into single improves performance stability NER. We provide detailed analysis results.

参考文章(24)
Jun'ichi Kazama, Kentaro Torisawa, Exploiting Wikipedia as External Knowledge for Named Entity Recognition empirical methods in natural language processing. pp. 698- 707 ,(2007)
Charles Sutton, Andrew McCallum, An Introduction to Conditional Random Fields for Relational Learning MIT Press. ,(2007)
Yoshimasa Tsuruoka, Jun'ichi Tsujii, Bidirectional inference with the easiest-first strategy for tagging sequence data Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT '05. pp. 467- 474 ,(2005) , 10.3115/1220575.1220634
Angie Williams, Introduction To The Colloquy Journal of Language and Social Psychology. ,vol. 22, pp. 47- 49 ,(2003) , 10.1177/0261927X02250054
Taku Kudo, Yuji Matsumoto, Chunking with support vector machines Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001 - NAACL '01. pp. 1- 8 ,(2001) , 10.3115/1073336.1073361
Lev Ratinov, Dan Roth, Design Challenges and Misconceptions in Named Entity Recognition conference on computational natural language learning. pp. 147- 155 ,(2009) , 10.3115/1596374.1596399
Erik F. Tjong Kim Sang, Jorn Veenstra, Representing text chunks Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics -. pp. 173- 179 ,(1999) , 10.3115/977035.977059
David Nadeau, Satoshi Sekine, A survey of named entity recognition and classification Lingvisticae Investigationes. ,vol. 30, pp. 3- 26 ,(2007) , 10.1075/LI.30.1.03NAD
Yanpeng Li, Hongfei Lin, Zhihao Yang, Incorporating rich background knowledge for gene named entity classification and recognition BMC Bioinformatics. ,vol. 10, pp. 223- 223 ,(2009) , 10.1186/1471-2105-10-223
ROBERT LEAMAN, GRACIELA GONZALEZ, BANNER: an executable survey of advances in biomedical named entity recognition. pacific symposium on biocomputing. pp. 652- 663 ,(2007) , 10.1142/9789812776136_0062