Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes.

作者: Peter C. Austin , Jack V. Tu , Jennifer E. Ho , Daniel Levy , Douglas S. Lee

DOI: 10.1016/J.JCLINEPI.2012.11.008

关键词: Regression analysisRandom forestMachine learningBootstrap aggregatingRegressionArtificial intelligenceBoosting (machine learning)Data miningLogistic regressionPopulationSupport vector machineComputer science

摘要: Abstract Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying according to disease etiology subtype. Classification trees are frequently used the presence absence of However, classification can suffer from limited accuracy. In data-mining and machine-learning literature, alternate schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, support vector machines. Study Design Setting We compared performance these methods that conventional heart failure (HF) following subtypes: HF preserved ejection fraction (HFPEF) reduced fraction. also ability predict probability HFPEF logistic regression. Results found modern, flexible tree-based literature offer substantial improvement prediction subtype regression trees. had superior for predicting proposed literature. Conclusion The use offers over subtypes population-based sample Ontario, Canada. do not improvements HFPEF.

参考文章(41)
Matthew Wiener, Andy Liaw, Classification and Regression by randomForest ,(2007)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Xiao-Hua Zhou, Nancy A. Obuchowski, Donna K. McClish, Statistical Methods in Diagnostic Medicine Wiley Blackwell. ,(2002) , 10.1002/9780470906514
W Sauerbrei, H Madjar, HJ Prömpeler, None, Differentiation of benign and malignant breast tumors by logistic regression and a classification tree using Doppler flow signals. Methods of Information in Medicine. ,vol. 37, pp. 226- 234 ,(1998) , 10.1055/S-0038-1634530
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Thomas M Bashore, Thomas Gehrig, Cholesterol emboli after invasive cardiac procedures. Journal of the American College of Cardiology. ,vol. 42, pp. 217- 218 ,(2003) , 10.1016/S0735-1097(03)00587-4
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
W. N. Venables, B. D. Ripley, Tree-based Methods Springer, New York, NY. pp. 329- 347 ,(1994) , 10.1007/978-1-4899-2819-1_13