Automated literature mining and hypothesis generation through a network of Medical Subject Headings

作者: Stephen Joseph Wilson , Angela Dawn Wilkins , Matthew V. Holt , Byung Kwon Choi , Daniel Konecki

DOI: 10.1101/403667

关键词:

摘要: ABSTRACT The scientific literature is vast, growing, and increasingly specialized, making it difficult to connect disparate observations across subfields. To address this problem, we sought develop automated hypothesis generation by networking at scale the MeSH terms curated National Library of Medicine. result a Mesh Term Objective Reasoning (MeTeOR) approach that tallies associations among genes, drugs diseases from PubMed predicts new ones. Comparisons reference databases algorithms show MeTeOR tends be more reliable. We also many predictions based on prior 2014 were published subsequently. In practical application, validated experimentally surprising association found between novel Epidermal Growth Factor Receptor (EGFR) CDK2. conclude generates useful hypotheses (http://meteor.lichtargelab.org/). AUTHOR SUMMARY large size exponential expansion forms bottleneck accessing understanding findings. Manual curation Natural Language Processing (NLP) aim summarizing disseminating knowledge within articles as key relationships (e.g. TP53 relates Cancer). However, these methods compromise either coverage or accuracy, respectively. mitigate compromise, proposed using manually-assigned keywords (MeSH terms) extract publications demonstrated comparable but higher accuracy than current NLP methods. Furthermore, combined extracted with semi-supervised machine learning create guide future work discovered direct interaction two important cancer genes.

参考文章(76)
Swanson Dr, Medical literature as a potential source of new knowledge. Bulletin of The Medical Library Association. ,vol. 78, pp. 29- 37 ,(1990)
Brandon Pincombe, Michael D. Lee, Matthew Welsh, An Empirical Evaluation of Models of Text Document Similarity Proceedings of the Annual Meeting of the Cognitive Science Society. ,vol. 27, ,(2005)
Sergio E Baranzini, Pouya Khankhanian, Nikolaos A Patsopoulos, Michael Li, Jim Stankovich, Chris Cotsapas, Helle Bach Søndergaard, Maria Ban, Nadia Barizzone, Laura Bergamaschi, David Booth, Dorothea Buck, Paola Cavalla, Elisabeth G Celius, Manuel Comabella, Giancarlo Comi, Alastair Compston, Isabelle Cournu-Rebeix, Sandra D’alfonso, Vincent Damotte, Lennox Din, Bénédicte Dubois, Irina Elovaara, Federica Esposito, Bertrand Fontaine, Andre Franke, An Goris, Pierre-Antoine Gourraud, Christiane Graetz, Franca R Guerini, Léna Guillot-Noel, David Hafler, Hakon Hakonarson, Per Hall, Anders Hamsten, Hanne F Harbo, Bernhard Hemmer, Jan Hillert, Anu Kemppinen, Ingrid Kockum, Keijo Koivisto, Malin Larsson, Mark Lathrop, Maurizio Leone, Christina M Lill, Fabio Macciardi, Roland Martin, Vittorio Martinelli, Filippo Martinelli-Boneschi, Jacob L McCauley, Kjell-Morten Myhr, Paola Naldi, Tomas Olsson, Annette Oturai, Margaret A Pericak-Vance, Franco Perla, Mauri Reunanen, Janna Saarela, Safa Saker-Delye, Marco Salvetti, Finn Sellebjerg, Per Soelberg Sørensen, Anne Spurkland, Graeme Stewart, Bruce Taylor, Pentti Tienari, Juliane Winkelmann, Frauke Zipp, Adrian J Ivinson, Jonathan L Haines, Stephen Sawcer, Philip DeJager, Stephen L Hauser, Jorge R Oksenberg, None, Network-based multiple sclerosis pathway analysis with GWAS data from 15,000 cases and 30,000 controls American Journal of Human Genetics. ,vol. 92, pp. 854- 865 ,(2013) , 10.1016/J.AJHG.2013.04.019
Thomas C Rindflesch, Carol Friedman, Dimitar Hristovski, Borut Peterlin, Exploiting Semantic Relations for Literature-Based Discovery american medical informatics association annual symposium. ,vol. 2006, pp. 349- 353 ,(2006)
J. Hirschberg, C. D. Manning, Advances in natural language processing. Science. ,vol. 349, pp. 261- 266 ,(2015) , 10.1126/SCIENCE.AAA8685
Saso Dzeroski, Dimitar Hristovski, Janez Stare, Borut Peterlin, Supporting discovery in medicine by association rule mining in Medline and UMLS. Studies in health technology and informatics. ,vol. 84, pp. 1344- 1348 ,(2001)
Jinn-Yuan Hsu, Kwang-Yu Chang, Shang-Hung Chen, Chung-Ta Lee, Sheng-Tsung Chang, Hung-Chi Cheng, Wen-Chang Chang, Ben-Kuen Chen, Epidermal growth factor-induced cyclooxygenase-2 enhances head and neck squamous cell carcinoma metastasis through fibronectin up-regulation Oncotarget. ,vol. 6, pp. 1723- 1739 ,(2015) , 10.18632/ONCOTARGET.2783
Johannes Stegmann, Guenter Grohmann, Hypothesis generation guided by co-word clustering Scientometrics. ,vol. 56, pp. 111- 135 ,(2003) , 10.1023/A:1021954808804
Matan Hofree, John P Shen, Hannah Carter, Andrew Gross, Trey Ideker, Network based stratification of tumor mutations Nature Methods. ,vol. 10, pp. 1108- 1115 ,(2015) , 10.1038/NMETH.2651
E. Horowitz, Fortran Can it be Structured-Should it be? Computer. ,vol. 8, pp. 30- 37 ,(1975) , 10.1109/C-M.1975.218980