Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

作者： Pedro J. García-Laencina , Pedro Henriques Abreu , Miguel Henriques Abreu , Noémia Afonoso

DOI: 10.1016/J.COMPBIOMED.2015.02.006

关键词:

摘要: Breast cancer is the most frequently diagnosed in women. Using historical patient information stored clinical datasets, data mining and machine learning approaches can be applied to predict survival of breast patients. A common drawback absence information, i.e., missing data, certain trials. However, standard prediction methods are not able handle incomplete samples and, then, imputation a widely approach for solving this inconvenience. Therefore, taking into account characteristics each dataset, it required perform detailed analysis determine appropriate environment. This research work analyzes real dataset from Institute Portuguese Oncology Porto with high percentage unknown categorical (most patients incomplete), which challenge terms complexity. Four scenarios evaluated: (I) 5-year without cleaned (II) Mode imputation, (III) Expectation-Maximization (IV) K-Nearest Neighbors imputation. Prediction models survivability constructed using four different methods: Neighbors, Classification Trees, Logistic Regression Support Vector Machines. Experiments performed nested ten-fold cross-validation procedure according obtained results, best results provided by algorithm: more than 81% accuracy 0.78 area under Receiver Operator Characteristic curve, constitutes very good complex scenario. HighlightsA model context.The complexity due its ratio.Several representative decision analyzed.Obtained interesting accurate dataset.

sciencedirect.com 本地加速

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(47)

Pedro Henriques Abreu, Hugo Amaro, Daniel Castro Silva, Penousal Machado, Miguel Henriques Abreu, Noémia Afonso, António Dourado, Overall Survival Prediction for Women Breast Cancer Using Ensemble Methods and Incomplete Clinical Data Springer, Cham. pp. 1366- 1369 ,(2014) , 10.1007/978-3-319-00846-2_338

Roger K. Blashfield, Mark S. Aldenderfer, Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 Sage Publications. ,(1984)

Pedro Henriques Abreu, Hugo Amaro, Daniel Castro Silva, Penousal Machado, Miguel Henriques Abreu, Personalizing Breast Cancer Patients with Heterogeneous Data Springer, Cham. pp. 39- 42 ,(2014) , 10.1007/978-3-319-03005-0_11

Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)

Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics) Springer-Verlag New York, Inc.. ,(2006)

Bernhard Schölkopf, Alexander J. Smola, Learning with Kernels The MIT Press. pp. 626- ,(2018) , 10.7551/MITPRESS/4175.001.0001

Nello Cristianini, John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods ,(2000)

J.A.K. Suykens, J. Vandewalle, Least Squares Support Vector Machine Classifiers Neural Processing Letters. ,vol. 9, pp. 293- 300 ,(1999) , 10.1023/A:1018628609742

Joseph A. Cruz, David S. Wishart, Applications of Machine Learning in Cancer Prediction and Prognosis Cancer Informatics. ,vol. 2, pp. 59- 77 ,(2006) , 10.1177/117693510600200030

10.

Christopher M. Bishop, Pattern Recognition and Machine Learning ,(2006)

Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

来源期刊

我的账户

Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

来源期刊

相似文章 10

我的账户