Statistical inference from large-scale genomic data

作者: Yinyin Yuan

DOI:

关键词:

摘要: This thesis explores the potential of statistical inference methodologies in their applications functional genomics. In essence, it summarises algorithmic findings this field, providing step-by-step analytical for deciphering biological knowledge from large-scale genomic data, mainly microarray gene expression time series. This covers a range topics investigation complex multivariate data. One focus involves using clustering as method and another is cluster validation to extract meaningful information Information gained application these various techniques can then be used conjointly elucidation regulatory networks, ultimate goal type analysis. First, new tight data proposed obtain tighter potentially more informative clusters. Next, fully utilise validation, validity index defined based on one most important ontologies within Bioinformatics community, Gene Ontology. The bridges gap current literature, sense that takes into account not only variations Ontology categories specificities significance clusters, but also structure Finally, Bayesian probability applied making heterogeneous integrated with previous efforts thesis, aim network inference. system comes stochastic process achieve robustness noise, yet remains efficient enough analysis. Ultimately, solutions presented serve building blocks an intelligent interpreting understanding organisation genome.

参考文章(157)
Chris Fraley, Adrian E. Raftery, MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-Based Clustering Defense Technical Information Center. ,(2006) , 10.21236/ADA456562
Yinyin Yuan, Chang-Tsun Li, A Bayes random field approach for integrative large-scale regulatory network analysis. Journal of Integrative Bioinformatics. ,vol. 5, pp. 1- 20 ,(2008) , 10.2390/BIECOLL-JIB-2008-99
Nadia Bolshakova, Francisco Azuaje, Pádraig Cunningham, None, Incorporating biological domain knowledge into cluster validity assessment Lecture Notes in Computer Science. pp. 13- 22 ,(2006) , 10.1007/11732242_2
Na Tang, V. Rao Vemuri, A Web-knowledge-based Clustering Model for Gene Expression Data Analysis Advances in Web Intelligence and Data Mining. pp. 233- 242 ,(2006) , 10.1007/3-540-33880-2_24
Rudolf Beran, 30 Minimum distance procedures Handbook of Statistics. ,vol. 4, pp. 741- 754 ,(1984) , 10.1016/S0169-7161(84)04032-3
J McLachlan, G, D. Peel, Finite Mixture Models ,(2000)