Classification of genes using probabilistic models of microarray expression profiles

作者: Paul Pavlidis , Christopher Tang , William Stafford Noble

DOI:

关键词: Training setExpression (mathematics)Statistical modelProbabilistic logicMicroarrayClass (biology)Sequence analysisPattern recognitionGeneComputational biologyComputer scienceArtificial intelligenceSupervised learning

摘要: Microarray expression data provides a new method for classifying genes and gene products according to their profiles. Numerous unsupervised supervised learning methods have been applied the task of discovering recognize classes co-expressed genes. Here we present based upon techniques borrowed from biological sequence analysis. The profile class is summarized in probabilistic model similar position-specific scoring matrix (PSSM). This insight into characteristics class, as well accurate recognition performance. Because PSSM models are generative, they particularly useful when biologist can identify priori but unable large collection non serve negative training set. We validate technique using S. cerevisiae C. elegans.

参考文章(19)
Gustavo Stolovitzky, Andrea Califano, Yuhai Tu, Analysis of Gene Expression Microarrays for Phenotype Classification intelligent systems in molecular biology. ,vol. 8, pp. 75- 85 ,(2000)
J. Downward, A. Schulze, Analysis of gene expression by microarrays: cell biologist's gold mine or minefield? Journal of Cell Science. ,vol. 113, pp. 4151- 4156 ,(2000) , 10.1242/JCS.113.23.4151
C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald, J. Wootton, Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment Science. ,vol. 262, pp. 208- 214 ,(1993) , 10.1126/SCIENCE.8211139
R. L. Tatusov, S. F. Altschul, E. V. Koonin, Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 91, pp. 12091- 12095 ,(1994) , 10.1073/PNAS.91.25.12091
TIMOTHY L. BAILEY, MICHAEL GRIBSKOV, Methods and Statistics for Combining Motif Match Scores Journal of Computational Biology. ,vol. 5, pp. 211- 221 ,(1998) , 10.1089/CMB.1998.5.211
M. Jiang, J. Ryu, M. Kiraly, K. Duke, V. Reinke, S. K. Kim, Genome-wide analysis of developmental and sex-regulated gene expression profiles in Caenorhabditis elegans Proceedings of the National Academy of Sciences of the United States of America. ,vol. 98, pp. 218- 223 ,(2001) , 10.1073/PNAS.98.1.218
Alvis Brazma, Jaak Vilo, Gene expression data analysis FEBS Letters. ,vol. 480, pp. 17- 24 ,(2000) , 10.1016/S0014-5793(00)01772-5
Paul Pavlidis, Terrence S Furey, Muriel Liberto, David Haussler, William Noble Grundy, None, Promoter region-based classification of genes. pacific symposium on biocomputing. pp. 151- 163 ,(2000) , 10.1142/9789814447362_0016
Valerie Reinke, Harold E. Smith, Jeremy Nance, John Wang, Carrie Van Doren, Rebecca Begley, Steven J.M. Jones, Elizabeth B. Davis, Stewart Scherer, Samuel Ward, Stuart K. Kim, A Global Profile of Germline Gene Expression in C. elegans Molecular Cell. ,vol. 6, pp. 605- 616 ,(2000) , 10.1016/S1097-2765(00)00059-9