作者: David J. Miller , Yanxin Zhang , Guoqiang Yu , Yongmei Liu , Li Chen
DOI: 10.1093/BIOINFORMATICS/BTP435
关键词: Machine learning 、 Sample size determination 、 Computational biology 、 Artificial neural network 、 Support vector machine 、 Principle of maximum entropy 、 Conditional probability 、 Genome-wide association study 、 Entropy (information theory) 、 Artificial intelligence 、 Bayesian information criterion 、 Mathematics
摘要: Motivation: In both genome-wide association studies (GWAS) and pathway analysis, the modest sample size relative to number of genetic markers presents formidable computational, statistical methodological challenges for accurately identifying markers/interactions building phenotype-predictive models. Results: We address these objectives via maximum entropy conditional probability modeling (MECPM), coupled with a novel model structure search. Unlike neural networks support vector machines (SVMs), MECPM makes explicit is determined by interactions that confer power. Our method identifies marker subset multiple k-way between markers. Additional key aspects are: (i) evaluation select up five-way while retaining relatively low complexity; (ii) flexible single nucleotide polymorphism (SNP) coding (dominant, recessive) within each interaction; (iii) no mathematical interaction form assumed; (iv) order selection based on Bayesian Information Criterion, which fairly compares at different orders automatically sets experiment-wide significance level; (v) directly yields model. was compared panel methods datasets 1000 SNPs eight embedded penetrance function (i.e. ground-truth) interactions, including five-way, involving less than 20 SNPs. achieved improved sensitivity specificity detecting ground-truth previous methods. Availability: http://www.cbil.ece.vt.edu/ResearchOngoingSNP.htm Contact: djmiller@engr.psu.edu Supplementary information:Supplementary data are available Bioinformatics online.