Discovering significant OPSM subspace clusters in massive gene expression data

作者: Byron J. Gao , Obi L. Griffith , Martin Ester , Steven J. M. Jones

DOI: 10.1145/1150402.1150529

关键词:

摘要: Order-preserving submatrixes (OPSMs) have been accepted as a biologically meaningful subspace cluster model, capturing the general tendency of gene expressions across subset conditions. In an OPSM, expression levels all genes induce same linear ordering OPSM mining is reducible to special case sequential pattern problem, in which and its supporting sequences uniquely specify cluster. Those small twig clusters, specified by long patterns with naturally low support, incur explosive computational costs would be completely pruned off most existing methods for massive datasets containing thousands conditions hundreds genes, are common today's analysis. However, it particular interest biologists reveal such groups that tightly coregulated under many conditions, some pathways or processes might require only two act concert. this paper, we introduce KiWi framework datasets, exploits parameters k w provide biased testing on bounded number candidates, substantially reducing search space problem scale, targeting highly promising seeds lead significant clusters clusters. Extensive biological evaluations real demonstrate can effectively mine good efficiency scalability.

参考文章(20)
George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)
Ron Rymon, Search through systematic set enumeration principles of knowledge representation and reasoning. pp. 539- 550 ,(1992)
Ramakrishnan Srikant, Rakesh Agrawal, Mining sequential patterns: Generalizations and performance improvements Advances in Database Technology — EDBT '96. pp. 1- 17 ,(1996) , 10.1007/BFB0014140
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314
Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu, Clustering by pattern similarity in large data sets Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02. pp. 394- 405 ,(2002) , 10.1145/564691.564737
R. Albert, Scale-free networks in cell biology Journal of Cell Science. ,vol. 118, pp. 4947- 4957 ,(2005) , 10.1242/JCS.02714
Obi L. Griffith, Erin D. Pleasance, Debra L. Fulton, Mehrdad Oveisi, Martin Ester, Asim S. Siddiqui, Steven J.M. Jones, Assessment and integration of publicly available SAGE, cDNA microarray, and oligonucleotide microarray expression data for global coexpression analyses. Genomics. ,vol. 86, pp. 476- 488 ,(2005) , 10.1016/J.YGENO.2005.06.009
Ingrid Hedenfalk, David Duggan, Yidong Chen, Michael Radmacher, Michael Bittner, Richard Simon, Paul Meltzer, Barry Gusterson, Manel Esteller, Mark Raffeld, Zohar Yakhini, Amir Ben-Dor, Edward Dougherty, Juha Kononen, Lukas Bubendorf, Wilfrid Fehrle, Stefania Pittaluga, Sofia Gruvberger, Niklas Loman, Oskar Johannsson, Håkan Olsson, Benjamin Wilfond, Guido Sauter, Olli-P. Kallioniemi, Åke Borg, Jeffrey Trent, Gene-Expression Profiles in Hereditary Breast Cancer The New England Journal of Medicine. ,vol. 344, pp. 539- 548 ,(2001) , 10.1056/NEJM200102223440801
A. I. Su, M. P. Cooke, K. A. Ching, Y. Hakak, J. R. Walker, T. Wiltshire, A. P. Orth, R. G. Vega, L. M. Sapinoso, A. Moqrich, A. Patapoutian, G. M. Hampton, P. G. Schultz, J. B. Hogenesch, Large-scale analysis of the human and mouse transcriptomes Proceedings of the National Academy of Sciences of the United States of America. ,vol. 99, pp. 4465- 4470 ,(2002) , 10.1073/PNAS.012025199
Amir Ben-Dor, Benny Chor, Richard Karp, Zohar Yakhini, Discovering local structure in gene expression data: the order-preserving submatrix problem. Journal of Computational Biology. ,vol. 10, pp. 373- 384 ,(2003) , 10.1089/10665270360688075