Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNPs that Associate with Disease

作者: Richard T. Guy , Peter Santago , Carl D. Langefeld

DOI: 10.1002/GEPI.21608

关键词:

摘要: Complex genetic disorders are a result of combination and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus environmental associations thought drive complex traits. Decision trees, popular machine technique, offer computationally low complexity algorithm capable detecting associated sets single nucleotide polymorphisms (SNPs) arbitrary size, including modern genome-wide SNP scans. However, interpretation importance an individual within these trees can present challenges. We new decision tree denoted as Bagged Alternating Trees (BADTrees) that is based on identifying common structural elements in bootstrapped set (ADTrees). The order nk(2), where n number SNPs considered k constructed. Our simulation study suggests BADTrees have higher power lower type I error rates than ADTrees alone comparable with compared logistic regression. illustrate application data using simulated well from Lupus Large Association Study 1 (7,822 3,548 individuals). results suggest promise computational for combinations factors disease.

参考文章(10)
Bernhard Pfahringer, Geoffrey Holmes, Richard Kirkby, Optimizing the Induction of Alternating Decision Trees pacific-asia conference on knowledge discovery and data mining. ,vol. 2035, pp. 477- 487 ,(2001) , 10.1007/3-540-45357-1_50
International Consortium for Systemic Lupus Erythematosus Genetics (SLEGEN), John B Harley, Marta E Alarcón-Riquelme, Lindsey A Criswell, Chaim O Jacob, Robert P Kimberly, Kathy L Moser, Betty P Tsao, Timothy J Vyse, Carl D Langefeld, None, Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants in ITGAM, PXK, KIAA1542 and other loci. Nature Genetics. ,vol. 40, pp. 204- 210 ,(2008) , 10.1038/NG.81
Alexandre Bureau, Jos�e Dupuis, Kathleen Falls, Kathryn L. Lunetta, Brooke Hayward, Tim P. Keith, Paul Van Eerdewegh, Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology. ,vol. 28, pp. 171- 182 ,(2005) , 10.1002/GEPI.20041
Kuang-Yu Liu, Jennifer Lin, Xiaobo Zhou, Stephen TC Wong, Boosting Alternating Decision Trees Modeling of Disease Trait Information BMC Genetics. ,vol. 6, pp. 1- 6 ,(2005) , 10.1186/1471-2156-6-S1-S132
Tricia A. Thornton-Wells, Jason H. Moore, Jonathan L. Haines, Genetics, statistics and human disease: analytical retooling for complexity. Trends in Genetics. ,vol. 20, pp. 640- 647 ,(2004) , 10.1016/J.TIG.2004.09.007
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, The WEKA data mining software ACM SIGKDD Explorations Newsletter. ,vol. 11, pp. 10- 18 ,(2009) , 10.1145/1656274.1656278
David J. Miller, Yanxin Zhang, Guoqiang Yu, Yongmei Liu, Li Chen, Carl D. Langefeld, David Herrington, Yue Wang, An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions Bioinformatics. ,vol. 25, pp. 2478- 2485 ,(2009) , 10.1093/BIOINFORMATICS/BTP435