作者: Lu Zhang , Jian-fu An , Hui-ping Xue , ying Chen , Jian-guo Wu
DOI: 10.1007/S12204-012-1311-Z
关键词: Term (time) 、 Receiver operating characteristic 、 Sensitivity (control systems) 、 Web application 、 tf–idf 、 Computer science 、 Bayesian algorithm 、 Network model 、 Data mining 、 Recall rate
摘要: With an upsurge in biomedical literature, using data-mining method to search new knowledge from literature has drawing more attention of scholars. In this study, taking the mining non-coding gene network database PubMed as example, we first preprocessed abstract data, next applied term occurrence frequency (TF) and inverse document (IDF) (TF-IDF) select features, then established a model based on Bayesian algorithm. Finally, assessed through area under receiver operating characteristic curve (AUC), accuracy, specificity, sensitivity, precision rate recall rate. When 1 000 features are selected, AUC, accuracy rate, 0.868 3, 84.63%, 89.02%, 86.83%, 89.02% 98.14%, respectively. These results indicate that our can identify targeted related particular topic effectively.