Genome simulation approaches for synthesizing in silico datasets for human genomics.

作者: Marylyn D. Ritchie , William S. Bush

DOI: 10.1016/B978-0-12-380862-2.00001-1

关键词: Biological dataEnvironmental dataData simulationBiologyMachine learningGenomeHuman genomicsArtificial intelligenceBioinformaticsGenomicsIn silicoSoftware

摘要: Simulated data is a necessary first step in the evaluation of new analytic methods because simulated true effects are known. To successfully develop novel statistical and computational for genetic analysis, it vital to simulate datasets consisting single nucleotide polymorphisms (SNPs) spread throughout genome at density similar that observed by high-throughput molecular genomics studies. In addition, simulation environmental will be essential properly formulate risk models complex disorders. Data simulations often criticized they much less noisy than natural biological data, as nearly impossible multitude possible sources experimental variability. However, simulating silico most straightforward way test potential during development. Thus, advances increase complexity permit investigators better assess analytical methods. this work, we briefly describe some current approaches human describing advantages disadvantages various approaches. We also include details on software packages available simulation. Finally, expand upon one particular approach creation complex, genomic uses forward-time population algorithm: genomeSIMLA. Many hallmark features can synthesized silico; still research needed enhance our capabilities create capture datasets.

参考文章(46)
Mike Schmidt, Elizabeth R Hauser, Eden R. Martin, Silke Schmidt, Extension of the SIMLA Package for Generating Pedigrees with Complex Inheritance Patterns: Environmental Covariates, Gene-Gene and Gene-Environment Interaction Statistical Applications in Genetics and Molecular Biology. ,vol. 4, pp. 1- 22 ,(2005) , 10.2202/1544-6115.1133
Jonathan Marchini, Peter Donnelly, Lon R Cardon, None, Genome-wide strategies for detecting multiple loci that influence complex diseases Nature Genetics. ,vol. 37, pp. 413- 417 ,(2005) , 10.1038/NG1537
MR Nelson, SLR Kardia, RE Ferrell, CF Sing, A Combinatorial Partitioning Method to Identify Multilocus Genotypic Partitions That Predict Quantitative Trait Variation Genome Research. ,vol. 11, pp. 458- 470 ,(2001) , 10.1101/GR.172901
SCOTT M. DUDEK, ALISON A. MOTSINGER, DIGNA R. VELEZ, SCOTT M. WILLIAMS, MARYLYN D. RITCHIE, Data simulation software for whole-genome association and other studies in human genetics. pacific symposium on biocomputing. pp. 499- 510 ,(2005) , 10.1142/9789812701626_0046
Jing Li, Yixuan Chen, Generating samples for association studies based on HapMap data BMC Bioinformatics. ,vol. 9, pp. 44- 44 ,(2008) , 10.1186/1471-2105-9-44
Feng Zhang, Jianfeng Liu, Jie Chen, Hong-Wen Deng, HAPSIMU: a genetic simulation platform for population-based association studies BMC Bioinformatics. ,vol. 9, pp. 331- 331 ,(2008) , 10.1186/1471-2105-9-331
Stephen Wolfram, A. John Mallinckrodt, Cellular automata and complexity ,(1994)
M.P. BASS, E.R. MARTIN, E.R. HAUSER, Pedigree generation for analysis of genetic linkage and association. pacific symposium on biocomputing. pp. 93- 103 ,(2003) , 10.1142/9789812704856_0010
Anja Wille, Josephine Hoh, Jurg Ott, Sum statistics for the joint detection of multiple disease loci in case-control association studies with SNP markers. Genetic Epidemiology. ,vol. 25, pp. 350- 359 ,(2003) , 10.1002/GEPI.10263
Jason H Moore, Lance W Hahn, Marylyn D Ritchie, Tricia A Thornton, Bill C White, Routine Discovery of Complex Genetic Models using Genetic Algorithms soft computing. ,vol. 4, pp. 79- 86 ,(2004) , 10.1016/J.ASOC.2003.08.003