eGIFT: Mining Gene Information from the Literature

作者: Catalina O Tudor , Carl J Schmidt , K Vijay-Shanker

DOI: 10.1186/1471-2105-11-418

关键词:

摘要: With the biomedical literature continually expanding, searching PubMed for information about specific genes becomes increasingly difficult. Not only can thousands of results be returned, but gene name ambiguity leads to many irrelevant hits. As a result, it is difficult life scientists and curators rapidly get an overall picture from documents that mention its names synonyms. In this paper, we present eGIFT ( http://biotm.cis.udel.edu/eGIFT ), web-based tool associates informative terms, called i Terms, sentences containing them, with genes. To associate Terms gene, ranks based on score which compares frequency occurrence term in gene's general. retrieve (Medline abstracts), considers all names, aliases, Since ambiguous, applies disambiguation step remove matches do not correspond gene. Another additional filtering process applied retain those abstracts focus rather than passing. eGIFT's pre-computed users search by using or EntrezGene identifier. are grouped into different categories facilitate quick inspection. also links Term mentioning allow see relation between We evaluated precision recall 40 genes; 88% 94% were marked as salient our evaluators, UniProtKB keywords these identified Terms. Our evaluations suggest capture highly-relevant aspects Furthermore, showing provide description helps survey high-throughput experiments, annotators find articles describing functions.

参考文章(29)
Manabu Torii, John Miller, K. Vijay-Shanker, Building Domain-Specific Taggers without Annotated (Domain) Data empirical methods in natural language processing. pp. 1103- 1111 ,(2007)
Otis Gospodnetić, Erik Hatcher, Doug Cutting, Lucene in Action ,(2004)
Brian J. Ciliax, Shamkant B. Navathe, Ray Dingledine, Martin Brandon, Ying Liu, Text mining functional keywords associated with genes. Studies in health technology and informatics. ,vol. 107, pp. 292- 296 ,(2004)
David Lipman, Johanna McEntyre, PubMed: bridging the information gap Canadian Medical Association Journal. ,vol. 164, pp. 1317- 1319 ,(2001)
ARIEL S. SCHWARTZ, MARTI A. HEARST, A simple algorithm for identifying abbreviation definitions in biomedical text. pacific symposium on biocomputing. pp. 451- 462 ,(2002) , 10.1142/9789812776303_0042
Jing Ding, Daniel Berleant, Jun Xu, Kenton Juhlin, Eve Wurtele, Andy Fulmer, GeneNarrator: Mining the Literaturome for Relations Among Genes Journal of Proteomics & Bioinformatics. ,vol. 2, pp. 360- 371 ,(2009) , 10.4172/JPB.1000096
D. Cheng, C. Knox, N. Young, P. Stothard, S. Damaraju, D. S. Wishart, PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites Nucleic Acids Research. ,vol. 36, pp. 399- 405 ,(2008) , 10.1093/NAR/GKN296
Szymon Kaczanowski, Pawel Siedlecki, Piotr Zielenkiewicz, The High Throughput Sequence Annotation Service (HT-SAS) - the shortcut from sequence to true Medline words. BMC Bioinformatics. ,vol. 10, pp. 148- 148 ,(2009) , 10.1186/1471-2105-10-148
Christian Blaschke, Eduardo Leon, Martin Krallinger, Alfonso Valencia, Evaluation of BioCreAtIvE assessment of task 2 BMC Bioinformatics. ,vol. 6, pp. S16- 13 ,(2005) , 10.1186/1471-2105-6-S1-S16