Mining Wiki Resources for Multilingual Named Entity Recognition

作者: Patrick Schone , Alexander E. Richman

DOI:

关键词: UkrainianLess Commonly Taught LanguagesEntity linkingNamed entityPortugueseNatural language processingComputer scienceProcess (engineering)Named-entity recognitionArtificial intelligence

摘要: In this paper, we describe a system by which the multilingual characteristics of Wikipedia can be utilized to annotate large corpus text with Named Entity Recognition (NER) tags requiring minimal human intervention and no linguistic expertise. This process, though value in languages for resources exist, is particularly useful less commonly taught languages. We show how format used identify possible named entities discuss detail process use Category structure inherent determine entity type proposed entity. further methods English language data bootstrap NER other demonstrate using generated as training sets variant BBN's Identifinder French, Ukrainian, Spanish, Polish, Russian, Portuguese, achieving overall F-scores high 84.7% on independent, human-annotated corpora, comparable trained up 40,000 words newswire.

参考文章(11)
Silviu Cucerzan, Large-Scale Named Entity Disambiguation Based on Wikipedia Data empirical methods in natural language processing. pp. 708- 716 ,(2007)
Evgeniy Gabrilovich, Shaul Markovitch, Feature generation for text categorization using world knowledge international joint conference on artificial intelligence. pp. 1048- 1053 ,(2005)
Michael Strube, Simone Paolo Ponzetto, WikiRelate! computing semantic relatedness using wikipedia national conference on artificial intelligence. pp. 1419- 1424 ,(2006)
Jun'ichi Kazama, Kentaro Torisawa, Exploiting Wikipedia as External Knowledge for Named Entity Recognition empirical methods in natural language processing. pp. 698- 707 ,(2007)
Daniel M. Bikel, Richard Schwartz, Ralph M. Weischedel, An Algorithm that Learns What‘s in a Name Machine Learning. ,vol. 34, pp. 211- 231 ,(1999) , 10.1023/A:1007558221122
Razvan C. Bunescu, Marius Pasca, Using Encyclopedic Knowledge for Named Entity Disambiguation conference of the european chapter of the association for computational linguistics. ,(2006)
David Milne, Olena Medelyan, Ian Witten, Mining Domain-Specific Thesauri from Wikipedia: A Case Study web intelligence. pp. 442- 448 ,(2006) , 10.1109/WI.2006.119
Evgeniy Gabrilovich, Shaul Markovitch, Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge national conference on artificial intelligence. pp. 1301- 1306 ,(2006)
Evgeniy Gabrilovich, Shaul Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis international joint conference on artificial intelligence. pp. 1606- 1611 ,(2007)
Iryna Gurevych, Torsten Zesch, Max Mühlhäuser, Analyzing and accessing Wikipedia as a lexical semantic resource Gunter Narr, Tübingen. pp. 197- 205 ,(2007)