Data-driven encoding for quantitative genetic trait prediction.

作者: Dan He , Zhanyong Wang , Laxmi Parida

DOI: 10.1186/1471-2105-16-S1-S10

关键词: Regression analysisGenetic markerQuantitative trait locusTraitGenetic architectureGeneticsBiologyEpistasisLinear modelAlleleComputational biology

摘要: Given a set of biallelic molecular markers, such as SNPs, with genotype values on collection plant, animal or human samples, the goal quantitative genetic trait prediction is to predict by simultaneously modeling all marker effects. Quantitative usually represented linear regression models which require encodings for genotypes: three distinct values, corresponding one heterozygous and two homozygous alleles, are coded integers, manipulated algebraically in model. Further, epistasis between multiple markers modeled multiplication markers: it unclear that model continues be effective under this. In this work we investigate effects problem. We first showed different lead accuracies, many test cases. then proposed data-driven encoding strategy, where encode genotypes according their distribution phenotypes allow each have encodings. show our experiments strategy able improve performance method more helpful oligogenic traits, whose rely relatively small markers. To best knowledge, paper discusses

参考文章(39)
Matthew A. Cleveland, John M. Hickey, Selma Forni, A common dataset for genomic analysis of livestock populations. G3: Genes, Genomes, Genetics. ,vol. 2, pp. 429- 435 ,(2012) , 10.1534/G3.111.001453
T. H. E. Meuwissen, M. E. Goddard, B. J. Hayes, Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps Genetics. ,vol. 157, pp. 1819- 1829 ,(2001) , 10.1093/GENETICS/157.4.1819
Alex J. Smola, Bernhard Schölkopf, A tutorial on support vector regression Statistics and Computing. ,vol. 14, pp. 199- 222 ,(2004) , 10.1023/B:STCO.0000035301.49549.88
Jonathan Marchini, Peter Donnelly, Lon R Cardon, None, Genome-wide strategies for detecting multiple loci that influence complex diseases Nature Genetics. ,vol. 37, pp. 413- 417 ,(2005) , 10.1038/NG1537
Trevor Park, George Casella, The Bayesian Lasso Journal of the American Statistical Association. ,vol. 103, pp. 681- 686 ,(2008) , 10.1198/016214508000000337
Scott Shaobing Chen, David L. Donoho, Michael A. Saunders, Atomic Decomposition by Basis Pursuit SIAM Journal on Scientific Computing. ,vol. 20, pp. 33- 61 ,(1998) , 10.1137/S1064827596304010
D. Habier, R. L. Fernando, J. C. M. Dekkers, The Impact of Genetic Relationship Information on Genome-Assisted Breeding Values Genetics. ,vol. 177, pp. 2389- 2397 ,(2007) , 10.1534/GENETICS.107.081190
Keyan Zhao, Chih-Wei Tung, Georgia C. Eizenga, Mark H. Wright, M. Liakat Ali, Adam H. Price, Gareth J. Norton, M. Rafiqul Islam, Andy Reynolds, Jason Mezey, Anna M. McClung, Carlos D. Bustamante, Susan R. McCouch, Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa Nature Communications. ,vol. 2, pp. 467- 467 ,(2011) , 10.1038/NCOMMS1467
William Webber, Alistair Moffat, Justin Zobel, A similarity measure for indefinite rankings ACM Transactions on Information Systems. ,vol. 28, pp. 20- ,(2010) , 10.1145/1852102.1852106