Rutabaga by any other name: extracting biological names

作者: Lynette Hirschman , Alexander A. Morgan , Alexander S. Yeh

DOI: 10.1016/S1532-0464(03)00014-5

关键词:

摘要: As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage information explosion. Biologists communicate their findings by relying precise terms; these terms then provide indices into literature and across growing number databases. This article examines emerging techniques access resources through extraction entity names relations among them. Information has been an active area in natural language processing there promising results for applied news stories, e.g., balanced precision recall 93-95% range identifying person, organization location names. But do not seem transfer directly names, where remain 75-80% range. Multiple factors may be involved, including absence shared training test sets rigorous measures progress, lack annotated data specific tasks, pervasive ambiguity terms, frequent introduction new a mismatch between evaluation tasks as defined real problems. We present evidence from simple lexical matching exercise that illustrates some problems encountered when conclude outlining agenda raise performance named tagging level it can used perform importance.

参考文章(28)
Claire Grover, Marc Moens, Andrei Mikheev, Description of the LTG system used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Chinatsu Aone, Mila Ramos-Santacruz, Lauren Halverson, Tom Hampton, SRA: Description of the IE2 System Used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Paul Wu, Shihong Yu, Shuanhu Bai, Description of the Kent Ridge Digital Labs System Used for MUC-7 Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Daniel M. Bikel, Richard Schwartz, Ralph M. Weischedel, An Algorithm that Learns What‘s in a Name Machine Learning. ,vol. 34, pp. 211- 231 ,(1999) , 10.1023/A:1007558221122
K Fukuda, T Tsunoda, A Tamura, T Takagi, Toward information extraction: identifying protein names from biological papers. pacific symposium on biocomputing. pp. 707- 718 ,(1998)
Bernard Jacq, Laurent Julliard, Denys Proux, Francois Rechenmann, Violaine Pillet, Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. Genome Informatics. ,vol. 9, pp. 72- 80 ,(1998) , 10.11234/GI1990.9.72
Mark Craven, Johan Kumlien, Constructing Biological Knowledge Bases by Extracting Information from Text Sources intelligent systems in molecular biology. pp. 77- 86 ,(1999)
L. HIRSCHMAN, R. GAIZAUSKAS, Natural language question answering: the view from here Natural Language Engineering. ,vol. 7, pp. 275- 300 ,(2001) , 10.1017/S1351324901002807
Mark Stevenson, Robert Gaizauskas, Using Corpus-derived Name Lists for Named Entity Recognition conference on applied natural language processing. pp. 290- 295 ,(2000) , 10.3115/974147.974187
Beth M. Sundheim, Overview of results of the MUC-6 evaluation Proceedings of the 6th conference on Message understanding - MUC6 '95. pp. 13- 31 ,(1995) , 10.3115/1072399.1072402