作者: Han-Cheol Cho , Naoaki Okazaki , Makoto Miwa , Jun’ichi Tsujii
DOI: 10.1016/J.IPM.2013.03.002
关键词: Named-entity recognition 、 Stability (learning theory) 、 Conditional random field 、 Discriminative model 、 Computer science 、 CRFS 、 Natural language processing 、 Feature engineering 、 Sequence labeling 、 Feature (machine learning) 、 Pattern recognition 、 Artificial intelligence
摘要: Named entity recognition (NER) is mostly formalized as a sequence labeling problem in which segments of named entities are represented by label sequences. Although considerable effort has been made to investigate sophisticated features that encode textual characteristics (e.g. PEOPLE, LOCATION, etc.), little attention paid segment representations (SRs) for multi-token the IOB2 notation). In this paper, we effects different SRs on NER tasks, and propose feature generation method using multiple SRs. The proposed allows model exploit not only highly discriminative complex but also robust simple against data sparseness problem. Since it incorporates functions Conditional Random Fields (CRFs), can use well-established procedure training. addition, tagging speed integrating be accelerated equivalent most SR integrated model. Experimental results demonstrate incorporating into single improves performance stability NER. We provide detailed analysis results.