Predicting Breast Cancer Patient Survival Using Machine Learning

作者: David Solti , Haijun Zhai

DOI: 10.1145/2506583.2512376

关键词:

摘要: Our null hypothesis was that a computer algorithm will not predict breast cancer patients' 10-year survival with greater accuracy than the 64.3% baseline of Surveillance Epidemiology and End Results (SEER) database [3]. The aims this study were to (1) Build an infrastructure convert SEER data into machine readable format; (2) Train Machine Learning (ML) algorithms survival; (3) Measure predictive ML algorithms. We downloaded 657,711 clinical demographic characteristics from converted them machine-readable feature vectors. An oncologist generated list potential variables for trained WEKA package's Logistic Regression (LR), Naive Bayes, C4.5 Decision Tree on using ten-fold cross validation. LR, achieved accuracies 76.29%, 59.71%, 77.43% respectively. compared results LR those well-known website, Adjuvant! Online. rejected Tree, but failed reject Bayes. Of tested, proved be most accurate predictor patient in ten years. In addition, provided more predictions without Adjuvant!'s limitations.

参考文章(2)
Ivo A. Olivotto, Chris D. Bajdik, Peter M. Ravdin, Caroline H. Speers, Andrew J. Coldman, Brian D. Norris, Greg J. Davis, Stephen K. Chia, Karen A. Gelmon, Population-Based Validation of the Prognostic Model ADJUVANT! for Early Breast Cancer Journal of Clinical Oncology. ,vol. 23, pp. 2716- 2725 ,(2005) , 10.1200/JCO.2005.06.178
David Hajage, Yann de Rycke, Marc Bollet, Alexia Savignoni, Martial Caly, Jean-Yves Pierga, Hugo M. Horlings, Marc J. Van de Vijver, Anne Vincent-Salomon, Brigitte Sigal-Zafrani, Claire Senechal, Bernard Asselain, Xavier Sastre, Fabien Reyal, External Validation of Adjuvant! Online Breast Cancer Prognosis Tool. Prioritising Recommendations for Improvement PLoS ONE. ,vol. 6, pp. e27446- ,(2011) , 10.1371/JOURNAL.PONE.0027446