Web-based biomedical literature mining

作者: Lu Zhang , Jian-fu An , Hui-ping Xue , ying Chen , Jian-guo Wu

DOI: 10.1007/S12204-012-1311-Z

关键词: Term (time)Receiver operating characteristicSensitivity (control systems)Web applicationtf–idfComputer scienceBayesian algorithmNetwork modelData miningRecall rate

摘要: With an upsurge in biomedical literature, using data-mining method to search new knowledge from literature has drawing more attention of scholars. In this study, taking the mining non-coding gene network database PubMed as example, we first preprocessed abstract data, next applied term occurrence frequency (TF) and inverse document (IDF) (TF-IDF) select features, then established a model based on Bayesian algorithm. Finally, assessed through area under receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, precision rate recall rate. When 1 000 features are selected, AUC, accuracy rate, 0.868 3, 84.63%, 89.02%, 86.83%, 89.02% 98.14%, respectively. These results indicate that our can identify targeted related particular topic effectively.

参考文章(15)
Z. Lu, W. Kim, W. J. Wilbur, Evaluating Relevance Ranking Strategies for MEDLINE Retrieval Journal of the American Medical Informatics Association. ,vol. 16, pp. 32- 36 ,(2009) , 10.1197/JAMIA.M2935
Chien-Lung Chan, Hsien-Wei Ting, Constructing a novel mortality prediction model with Bayes theorem and genetic algorithm Expert Systems With Applications. ,vol. 38, pp. 7924- 7928 ,(2011) , 10.1016/J.ESWA.2010.10.094
Ammar Al-Chalabi, Alexandra Dürr, Nicholas W Wood, Michael H Parkinson, Agnes Camuzat, Jean-Sebastien Hulot, Karen E Morrison, Alan Renton, Sigurd D Sussmuth, Bernhard G Landwehrmeyer, Albert Ludolph, Yves Agid, Alexis Brice, P Nigel Leigh, Gilbert Bensimon, NNIPPS Genetic Study Group, None, Genetic Variants of the α-Synuclein Gene SNCA Are Associated with Multiple System Atrophy PLoS ONE. ,vol. 4, pp. e7114- 6 ,(2009) , 10.1371/JOURNAL.PONE.0007114
Gilles Cohen, Mélanie Hilario, Hugo Sax, Stéphane Hugonnet, Antoine Geissbuhler, Learning from imbalanced data in surveillance of nosocomial infection Artificial Intelligence in Medicine. ,vol. 37, pp. 7- 18 ,(2006) , 10.1016/J.ARTMED.2005.03.002
Ladan Sayyah Ensan, Masoomeh Faghankhani, Anna Javanbakht, Seyed-Foad Ahmadi, Hamid Reza Baradaran, None, To compare PubMed Clinical Queries and UpToDate in teaching information mastery to clinical residents: a crossover randomized controlled trial. PLOS ONE. ,vol. 6, ,(2011) , 10.1371/JOURNAL.PONE.0023487
Olga V. Demler, Michael J. Pencina, Ralph B. D'Agostino, Equivalence of improvement in area under ROC curve and linear discriminant analysis coefficient under assumption of normality Statistics in Medicine. ,vol. 30, pp. 1410- 1418 ,(2011) , 10.1002/SIM.4196
Thorsten Barnickel, Jason Weston, Ronan Collobert, Hans-Werner Mewes, Volker Stümpflen, Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts PLoS ONE. ,vol. 4, pp. e6393- ,(2009) , 10.1371/JOURNAL.PONE.0006393
E. M. Marcotte, I. Xenarios, D. Eisenberg, Mining literature for protein–protein interactions Bioinformatics. ,vol. 17, pp. 359- 363 ,(2001) , 10.1093/BIOINFORMATICS/17.4.359
Artemy Kolchinsky, Alaa Abi-Haidar, Jasleen Kaur, Ahmed Abdeen Hamed, Luis M Rocha, Classification of Protein-Protein Interaction Full-Text Documents Using Text and Citation Network Features IEEE/ACM Transactions on Computational Biology and Bioinformatics. ,vol. 7, pp. 400- 411 ,(2010) , 10.1109/TCBB.2010.55