BioCreAtIvE Task 1A: gene mention finding evaluation

作者: Alexander Yeh , Alexander Morgan , Marc Colosimo , Lynette Hirschman

DOI: 10.1186/1471-2105-6-S1-S2

关键词:

摘要: The biological research literature is a major repository of knowledge. As the amount increases, it will get harder to find information interest on particular topic. There has been an increasing work text mining this literature, but comparing hard because lack standards for making comparisons. To address this, we worked with colleagues at Protein Design Group, CNB-CSIC, Madrid develop BioCreAtIvE (Critical Assessment Information Extraction in Biology), open common evaluation systems number tasks. We report here task 1A, which deals finding mentions genes and related entities text. "Finding mentions" basic task, can be used as building block other makes use data software provided by (US) National Center Biotechnology (NCBI). 15 teams took part 1A. A achieved scores over 80% F-measure (balanced precision recall). that tried their 1A help tasks reported mixed results. plus results are good, still somewhat lag best some domains such newswire, due complexity length gene names, compared person or organization names newswire.

参考文章(17)
Dennis Perzanowski, Ralph Grishman, Elaine Marsh, Chinatsu Aone, Lois Childs, Nancy Chinchor, Jim Cowie, Rob Gaizauskas, Megumi Kameyama, Tom Keenan, Boyan Onyshkevych, Martha Palmer, Beth Sundheim, Marc Vilain, Ralph Weischedel, MUC-7 Evaluation of IE Technology: Overview of Results. Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998. ,(1998)
Jochen L. Leidner, Beatrice Alex, Yuval Krymolowski, BioCreative Task 2.1. The Edinburgh-Stanford System ,(2004)
Lynette Hirschman, Alexander A. Morgan, Alexander S. Yeh, Rutabaga by any other name: extracting biological names Journal of Biomedical Informatics. ,vol. 35, pp. 247- 259 ,(2002) , 10.1016/S1532-0464(03)00014-5
Jeremiah Crim, Ryan McDonald, Fernando Pereira, None, Automatically annotating documents with normalized gene lists BMC Bioinformatics. ,vol. 6, pp. S13- 7 ,(2005) , 10.1186/1471-2105-6-S1-S13
Lorraine Tanabe, Natalie Xie, Lynne H Thom, Wayne Matten, W John Wilbur, GENETAG: a tagged corpus for gene/protein named entity recognition BMC Bioinformatics. ,vol. 6, pp. S3- 7 ,(2005) , 10.1186/1471-2105-6-S1-S3
Christian Blaschke, Eduardo Leon, Martin Krallinger, Alfonso Valencia, Evaluation of BioCreAtIvE assessment of task 2 BMC Bioinformatics. ,vol. 6, pp. S16- 13 ,(2005) , 10.1186/1471-2105-6-S1-S16
L Hirschman, The Evolution of evaluation: Lessons from the Message Understanding Conferences Computer Speech & Language. ,vol. 12, pp. 281- 305 ,(1998) , 10.1006/CSLA.1998.0102
Alexander Yeh, More accurate tests for the statistical significance of result differences international conference on computational linguistics. pp. 947- 953 ,(2000) , 10.3115/992730.992783
A. S. Yeh, L. Hirschman, A. A. Morgan, Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics. ,vol. 19, pp. 331- 339 ,(2003) , 10.1093/BIOINFORMATICS/BTG1046
Thorsten Joachims, Transductive Inference for Text Classification using Support Vector Machines international conference on machine learning. pp. 200- 209 ,(1999)