Using Machine Learning Techniques to Identify Key Risk Factors for Diabetes and Undiagnosed Diabetes.

作者: Avraham Adler

DOI:

关键词:

摘要: This paper reviews a wide selection of machine learning models built to predict both the presence diabetes and undiagnosed using eight years National Health Nutrition Examination Survey (NHANES) data. Models are tuned compared via their Brier Scores. The most relevant variables best performing then compared. A Support Vector Machine with linear kernel performed for predicting diabetes, returning score 0.0654 an AUROC 0.9235 on test set. An elastic net regression 0.0294 0.9439 Similar features appear prominently in sets models. Blood osmolality, family history, prevalance various compounds, hypertension key indicators all risk. For particular, there ethnicity or genetic components which arise as strong correlates well.

参考文章(39)
Kjell Johnson, Max Kuhn, Applied Predictive Modeling ,(2013)
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
James Franklin, The elements of statistical learning : data mining, inference,and prediction The Mathematical Intelligencer. ,vol. 27, pp. 83- 85 ,(2005) , 10.1007/BF02985802
Jesse Davis, Mark Goadrich, The relationship between Precision-Recall and ROC curves Proceedings of the 23rd international conference on Machine learning - ICML '06. ,vol. 148, pp. 233- 240 ,(2006) , 10.1145/1143844.1143874
Reinhard Selten, Axiomatic Characterization of the Quadratic Scoring Rule Experimental Economics. ,vol. 1, pp. 43- 62 ,(1998) , 10.1007/BF01426214
Houtao Deng, George Runger, Feature selection via regularized trees international joint conference on neural network. pp. 1- 8 ,(2012) , 10.1109/IJCNN.2012.6252640
Fred S. Guthery, Kenneth P. Burnham, David R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach The Journal of Wildlife Management. ,vol. 67, pp. 655- ,(2003) , 10.2307/3802723
N L Benowitz, Biomarkers of environmental tobacco smoke exposure. Environmental Health Perspectives. ,vol. 107, pp. 349- 355 ,(1999) , 10.1289/EHP.99107S2349
Shafi Habibi, Maryam Ahmadi, Somayeh Alizadeh, Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining. Global Journal of Health Science. ,vol. 7, pp. 304- 310 ,(2015) , 10.5539/GJHS.V7N5P304
Jian-jun Dong, Neng-jun Lou, Jia-jun Zhao, Zhong-wen Zhang, Lu-lu Qiu, Ying Zhou, Lin Liao, Evaluation of a risk factor scoring model in screening for undiagnosed diabetes in China population Journal of Zhejiang University SCIENCE B. ,vol. 12, pp. 846- 852 ,(2011) , 10.1631/JZUS.B1000390