作者: Yan V. Sun , Lawrence F. Bielak , Patricia A. Peyser , Stephen T. Turner , Patrick F. Sheedy
DOI: 10.1002/GEPI.20309
关键词: Artificial intelligence 、 Genetic epidemiology 、 Percentile 、 Subclinical infection 、 Body mass index 、 Lipoprotein particle 、 Random forest 、 Machine learning 、 Algorithm 、 Single-nucleotide polymorphism 、 Homocysteine 、 Medicine 、 Genetics(clinical) 、 Epidemiology
摘要: As part of the Genetic Epidemiology Network Arteriopathy study, hypertensive non-Hispanic White sibships were screened using 471 single nucleotide polymorphisms (SNPs) to identify genes influencing coronary artery calcification (CAC) measured by computed tomography. Individuals with detectable CAC and quantity Z70th age- sexspecific percentile classified as having a high burden compared individuals o70th percentile. Two sibs from each sibship randomly chosen divided into two data sets, 360 unrelated individuals. Within set, we applied machine learning algorithms, Random Forests RuleFit, best predictors among 17 risk factors SNPs. Using five-fold cross-validation, both methods had 70% sensitivity 60% specificity. Prediction accuracies significantly different random predictions (P-valueo0.001) based on 1,000 permutation tests. Predictability 287 tagSNPs was good all For Forests, top 50 predictors, same eight 15 found in sets while 12 for RuleFit. Replicable effects (in GPR35 NOS3) (age, body mass index, sex, serum glucose, high-density lipoprotein cholesterol, systolic blood pressure, homocysteine, triglycerides, fibrinogen, Lp(a) low-density particle size) identified methods. This study illustrates how can be used important, replicable subclinical atherosclerosis. Genet. Epidemiol. 32:350–360, 2008. r 2008 Wiley-Liss, Inc.