作者: Hirohisa Kishino , Peter J. Waddell
关键词: Tree (data structure) 、 Graphical model 、 Sampling distribution 、 Conditional probability distribution 、 Inference 、 Data mining 、 Biology 、 Partial correlation 、 Cluster analysis 、 Latent variable
摘要: At present, there is a lack of sound methodology to infer causal gene expression relationships on genome wide basis. We address this first by examining the behaviour some latest and fastest algorithms for tree cluster analysis, particularly hierarchical methods popular in phylogenetics. Combined with these are two novel distances based partial, rather than full, correlations. Theoretically, partial correlations should provide better evidence regulatory genetic links standard To compare clusters obtained many alternative we use consensus methods. analysis used partition metrics followed another level clustering. These, fit metric, all suggest that new give quite different trees those usually obtained. In second part consider graphical modeling interactions important genes cell cycle. Despite models seeming well occasions, despite experimental error structure close multivariate normal, considerable problems overcome. Latent variables, case missing from inferred have strong effect Also, data show clear sampling distributions conditional status cancer related genes, including TP53. Without full information which wild type appropriate cannot be fitted. These findings point need include distinguish not only relevant but also splice variants design phase microarray analysis. Failure do so will induce similar both latent variables distributions.