作者: Duncan C. Thomas
DOI: 10.1097/01.EDE.0000229950.29674.68
关键词:
摘要: A s David Hunter puts it so aptly in his commentary,' Pandora's box has been opened: Agenomics, transcriptomics, metabolomics, proteomics, interactomics, methylom ics .... It seems like every day a new genome-wide technology is introduced. Is this to become "a diagnostic dream and an analytical nightmare?"2 What the real value added that can be expected for molecular epidemiology? Much of use "-omics" technologies date exploratory nature.3 Analyses are typically performed 2 stages using training sample search patterns high-dimensional data followed by validation test predictions model. wide range data-mining tools have developed purpose, including hierarchical clustering, classification regression trees, self-organizing maps, random forests, multifactor dimension reduction, neural nets, name few (see Hoh Ott4 partial review). I would explore somewhat different paradigm: these inform hypothesis-directed pathway-based approaches epi demiology help direct analyses into more promising directions. In either case, basic idea entails estimation parameters model epidemiologic at hand.5 For example, candidate gene association study highly polymorphic ATM, one might include silico measures evolutionary conservation predicted effects on protein conformation or vitro functional assays each polymorphism6 as prior covariates relative risk variant.7'8 On broader scale, readily available genomic annotation prioritize signals from initial scan improve selection markers carry forward testing later multistage study.9"10 If multiple comparisons problem not daunting enough, consider genetics expression involving perhaps tens thousands expressed genes examined relation hundreds single nucleotide polymorphism (SNPs),1" all possible gene-gene interactions,12 haplo 13 type associations m SNP scan. Here, opportunity exploit bioinfor matics resources focus enormous potential far tapped. other circumstances, rather than relying external bioinformatics databases, investigator want apply some directly samples case-control cohort better characterize intermediate path ways. 14 think providing "missing link" among exposures, genes, disease. The cost assays, difficulties getting subject participation, need special tissue preparations may preclude obtaining such subjects large-scale study. Furthermore, reverse causation design (the biomarker being affected disease its treatment way around) could make any simple comparison meaningless. This suggests form sampling15 combined with modeling latent process.8