作者: Alexa Mondello , Zahra Sedighi-Maman
DOI: 10.1016/J.IJMEDINF.2021.104438
关键词:
摘要: Abstract Background Despite the increasing number of studies in breast cancer survival prediction, there is little attention put toward deceased patients and their lengths. Moreover, developing a model that both accurate interpretable remains challenge. Objective This paper proposes two-stage data analytic framework, where Stage I classifies statuses II predicts months for females with cancer. Since medical are not entirely clean nor prepared development, we aim to show preparation can strengthen simple Generalized Linear Model (GLM) 1 predict as complex models like Extreme Gradient Boosting (XGB) 2 Multilayer Perceptron based on Artificial Neural Networks (MLP-ANNs) 3 stages. Methods In I, use recent Surveillance, Epidemiology, End Results (SEER) 4 from 2004 2016 short term 6-months 3-years 6 month increments. Synthetic Minority Over-sampling Technique (SMOTE), 5 Relocating Safe-Level SMOTE (RSLS) , Adaptive (ADASYN) 7 re-sampling techniques, Least Absolute Shrinkage Selection Operator (LASSO) 8 Random Forest (RF) 9 feature selection methods along integer one-hot encoding combined three popular mining methods: GLM, XGB, MLP. II, who correctly predicted within 3-years. Again, employ MLP regression LASSO RF encode categorical features. We obtain Area Under Receiver Operating Characteristic Curve (AUC) 10 values 0.900, 0.898, 0.877, 0.852, 0.858 6-month, 1-, 1.5-, 2-, 2.5, 3-year time-points, respectively, using OneHotEncoding-GLM-LASSO-ADASYN. change Odds Ratio GLM manifest impact individual levels numerical features odds death. Mean Error (MAE) 11 7.960 OneHotEncoding-GLM-LASSO when predicting patients. present top contributing coefficient illustrate how presence each alters months. Conclusion To best our knowledge, this first study implements classification approach. All data-driven findings presented order assist clinicians make better care decisions an computationally efficient method status lengths patients, help foster human machine interactions.