Supporting HIV literature screening with data sampling and supervised learning

作者: Hayda Almeida , Marie-Jean Meurs , Leila Kosseim , Adrian Tsang , None

DOI: 10.1109/BIBM.2015.7359733

关键词:

摘要: This paper presents a supervised learning approach to support the screening of HIV literature. The manual biomedical literature is an important task in process systematic reviews. Researchers and curators have very demanding, time-consuming error-prone manually identifying documents that must be included review concerning specific problem. We implemented tasks, by automatically flagging potentially selected list retrieved database search. To overcome main issues associated with automatic task, we evaluated use data sampling, feature combinations, selection methods, generating total 105 classification models. models yielding best results were composed Logistic Model Trees classifier, fairly balanced training set, combination Bag-Of-Words MeSH terms. According our results, system correctly labels great majority relevant documents, it could used reviews allow researchers assess greater number less time.

参考文章(34)
Octavio Loyola-González, Milton García-Borroto, Miguel Angel Medina-Pérez, José Fco. Martínez-Trinidad, Jesús Ariel Carrasco-Ochoa, Guillermo De Ita, An Empirical Study of Oversampling and Undersampling Methods for LCMine an Emerging Pattern Based Classifier mexican conference on pattern recognition. pp. 264- 273 ,(2013) , 10.1007/978-3-642-38989-4_27
Carolyn E. Lipscomb, Medical Subject Headings (MeSH) Bulletin of The Medical Library Association. ,vol. 88, pp. 265- 266 ,(2000)
Magdalena Szumilas, Explaining odds ratios. Journal de l'Académie canadienne de psychiatrie de l'enfant et de l'adolescent. ,vol. 19, pp. 227- 229 ,(2010)
Rehan Akbani, Stephen Kwek, Nathalie Japkowicz, Applying support vector machines to imbalanced datasets european conference on machine learning. ,vol. 3201, pp. 39- 50 ,(2004) , 10.1007/978-3-540-30115-8_7
Richard J. Bolton, David J. Hand, Foster Provost, Leo Breiman, Richard J. Bolton, David J. Hand, Statistical Fraud Detection: A Review Statistical Science. ,vol. 17, pp. 235- 255 ,(2002) , 10.1214/SS/1042727940
Uma S. Mudunuri, Mohamad Khouja, Stephen Repetski, Girish Venkataraman, Anney Che, Brian T. Luke, F. Pascal Girard, Robert M. Stephens, Knowledge and Theme Discovery across Very Large Biological Data Sets Using Distributed Queries: A Prototype Combining Unstructured and Structured Data PLoS ONE. ,vol. 8, pp. e80503- 10 ,(2013) , 10.1371/JOURNAL.PONE.0080503
Niels Landwehr, Mark Hall, Eibe Frank, Logistic Model Trees Machine Learning. ,vol. 59, pp. 161- 205 ,(2005) , 10.1007/S10994-005-0466-3
Nicolás García-Pedrajas, Javier Pérez-Rodríguez, María García-Pedrajas, Domingo Ortiz-Boyer, Colin Fyfe, Class imbalance methods for translation initiation site recognition in DNA sequences Knowledge Based Systems. ,vol. 25, pp. 22- 34 ,(2012) , 10.1016/J.KNOSYS.2011.05.002