A two-stage modeling approach for breast cancer survivability prediction

作者: Alexa Mondello , Zahra Sedighi-Maman

DOI: 10.1016/J.IJMEDINF.2021.104438

关键词:

摘要: Abstract Background Despite the increasing number of studies in breast cancer survival prediction, there is little attention put toward deceased patients and their lengths. Moreover, developing a model that both accurate interpretable remains challenge. Objective This paper proposes two-stage data analytic framework, where Stage I classifies statuses II predicts months for females with cancer. Since medical are not entirely clean nor prepared development, we aim to show preparation can strengthen simple Generalized Linear Model (GLM) 1 predict as complex models like Extreme Gradient Boosting (XGB) 2 Multilayer Perceptron based on Artificial Neural Networks (MLP-ANNs) 3 stages. Methods In I, use recent Surveillance, Epidemiology, End Results (SEER) 4 from 2004 2016 short term 6-months 3-years 6 month increments. Synthetic Minority Over-sampling Technique (SMOTE), 5 Relocating Safe-Level SMOTE (RSLS) , Adaptive (ADASYN) 7 re-sampling techniques, Least Absolute Shrinkage Selection Operator (LASSO) 8 Random Forest (RF) 9 feature selection methods along integer one-hot encoding combined three popular mining methods: GLM, XGB, MLP. II, who correctly predicted within 3-years. Again, employ MLP regression LASSO RF encode categorical features. We obtain Area Under Receiver Operating Characteristic Curve (AUC) 10 values 0.900, 0.898, 0.877, 0.852, 0.858 6-month, 1-, 1.5-, 2-, 2.5, 3-year time-points, respectively, using OneHotEncoding-GLM-LASSO-ADASYN. change Odds Ratio GLM manifest impact individual levels numerical features odds death. Mean Error (MAE) 11 7.960 OneHotEncoding-GLM-LASSO when predicting patients. present top contributing coefficient illustrate how presence each alters months. Conclusion To best our knowledge, this first study implements classification approach. All data-driven findings presented order assist clinicians make better care decisions an computationally efficient method status lengths patients, help foster human machine interactions.

参考文章(37)
Muhammad Umer Khan, Jong Pill Choi, Hyunjung Shin, Minkoo Kim, Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare international conference of the ieee engineering in medicine and biology society. ,vol. 2008, pp. 5148- 5151 ,(2008) , 10.1109/IEMBS.2008.4650373
Haibo He, Yang Bai, Edwardo A. Garcia, Shutao Li, ADASYN: Adaptive synthetic sampling approach for imbalanced learning international joint conference on neural network. pp. 1322- 1328 ,(2008) , 10.1109/IJCNN.2008.4633969
Juhyeon Kim, Hyunjung Shin, Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data Journal of the American Medical Informatics Association. ,vol. 20, pp. 613- 618 ,(2013) , 10.1136/AMIAJNL-2012-001570
Yonghyun Nam, Hyunjung Shin, A Hybrid Cancer Prognosis System Based on Semi-Supervised Learning and Decision Trees Neural Information Processing. pp. 640- 648 ,(2013) , 10.1007/978-3-642-42042-9_79
W. Philip Kegelmeyer, Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, SMOTE: synthetic minority over-sampling technique Journal of Artificial Intelligence Research. ,vol. 16, pp. 321- 357 ,(2002) , 10.1613/JAIR.953
E Guven, B Abdelghani, PREDICTING BREAST CANCER SURVIVABILITY USING DATA MINING TECHNIQUES SIAM INTERNATIONAL CONFERENCE ON DATA MINING. pp. 0- 0 ,(2006)
Barbara McAneny, Jennifer Webster, Angela Kim, David Dooling, Personalized Prognostic Models for Oncology: A Machine Learning Approach arXiv: Applications. ,(2016)
Rohit J. Kate, Ramya Nadig, Stage-specific predictive models for breast cancer survivability. International Journal of Medical Informatics. ,vol. 97, pp. 304- 311 ,(2017) , 10.1016/J.IJMEDINF.2016.11.001
Zahra Sedighi Maman, Mohammad Ali Alamdar Yazdi, Lora A. Cavuoto, Fadel M. Megahed, A data-driven approach to modeling physical fatigue in the workplace using wearable sensors Applied Ergonomics. ,vol. 65, pp. 515- 529 ,(2017) , 10.1016/J.APERGO.2017.02.001
Chip M. Lynch, Behnaz Abdollahi, Joshua D. Fuqua, Alexandra R. de Carlo, James A. Bartholomai, Rayeanne N. Balgemann, Victor H. van Berkel, Hermann B. Frieboes, Prediction of lung cancer patient survival via supervised machine learning classification techniques International Journal of Medical Informatics. ,vol. 108, pp. 1- 8 ,(2017) , 10.1016/J.IJMEDINF.2017.09.013