Quantifying the Impact and Extent of Undocumented Biomedical Synonymy

作者: David R. Blair , Kanix Wang , Svetlozar Nestorov , James A. Evans , Andrey Rzhetsky

DOI: 10.1371/JOURNAL.PCBI.1003799

关键词:

摘要: Synonymous relationships among biomedical terms are extensively annotated within specialized terminologies, implying that synonymy is important for practical computational applications this field. It remains unclear, however, whether text mining actually benefits from documented and existing thesauri provide adequate coverage of these linguistic relationships. In study, we examine the impact extent undocumented a very large compendium thesauri. First, demonstrate missing has significant negative on named entity normalization, an problem field mining. To estimate amount currently thesauri, develop probabilistic model construction synonym terminologies capable handling wide range potential biases, evaluate its performance using broader domain near-synonymy general English words. Our predicts over 90% undocumented, result support experimentally through "crowd-sourcing." Finally, apply our to predict they vast majority (>90%) synonymous intend document. Overall, results expose dramatic incompleteness current suggest need "next-generation," high-coverage lexical terminologies.

参考文章(73)
Thomas C. Rindflesch, Jonathan R. Nebeker, Doug Redd, Qing T. Zeng, Synonym, topic model and predicate-based query expansion for retrieving clinical documents. american medical informatics association annual symposium. ,vol. 2012, pp. 1050- 1059 ,(2012)
Nigam H. Shah, Mark A. Musen, Rong Xu, A Comprehensive Analysis of Five Million UMLS Metathesaurus Terms Using Eighteen Million MEDLINE Citations. american medical informatics association annual symposium. ,vol. 2010, pp. 907- 911 ,(2010)
Gregory Grefenstette, Automatic Thesaurus Generation from Raw Text using Knowledge-Poor Techniques MAKING SENSE OF WORDS. NINTH ANNUAL CONFERENCE OF THE UW CENTRE FOR THE NEW OED AND TEXT RESEARCH. ,(1993)
Barbara Ann Kipfer, 21st Century Synonym and Antonym Finder ,(1995)
J. I. Rodale, The Synonym Finder ,(1958)
Kent A. Spackman, Colin Price, Michael Q. Stearns, Amy Y. Wang, SNOMED clinical terms: overview of the development process and project status. american medical informatics association annual symposium. pp. 662- 666 ,(2001)
Robert C. Merton, Theory of rational option pricing ,(2011)
Alan R. Aronson, Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program american medical informatics association annual symposium. pp. 17- 21 ,(2001)
Sherri de Coronado, Margaret W Haber, Nicholas Sioutos, Mark S Tuttle, Lawrence W Wright, None, NCI Thesaurus: using science-based terminology to integrate cancer research results. Studies in health technology and informatics. ,vol. 107, pp. 33- 37 ,(2004)