Stable biclustering of gene expression data with nonnegative matrix factorizations

作者: Doina Tilivea , Liviu Badea

DOI:

关键词:

摘要: Although clustering is probably the most frequently used tool for data mining gene expression data, existing approaches face at least one of following problems in this domain: a huge number variables (genes) as compared to samples, high noise levels, inability naturally deal with overlapping clusters, instability resulting clusters w.r.t. initialization algorithm well difficulty genes and samples simultaneously. In paper we show that all these can be elegantly dealt by using nonnegative matrix factorizations cluster simultaneously while allowing bicluster overlaps employing Positive Tensor Factorization perform two-way meta-clustering biclusters produced several different runs (thereby addressing above-mentioned instability). The application our approach large lung cancer dataset proved computationally tractable was able recover histological classification various subtypes represented dataset.

参考文章(9)
George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)
Philip M Kim, Bruce Tidor, Subsystem Identification Through Dimensionality Reduction of Large-Scale Gene Expression Data Genome Research. ,vol. 13, pp. 1706- 1718 ,(2003) , 10.1101/GR.903503
Max Welling, Markus Weber, Positive tensor factorization Pattern Recognition Letters. ,vol. 22, pp. 1255- 1261 ,(2001) , 10.1016/S0167-8655(01)00070-8
Sven Bergmann, Jan Ihmels, Naama Barkai, Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review E. ,vol. 67, pp. 031902- ,(2003) , 10.1103/PHYSREVE.67.031902
A. Bhattacharjee, W. G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E. J. Mark, E. S. Lander, W. Wong, B. E. Johnson, T. R. Golub, D. J. Sugarbaker, M. Meyerson, Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses Proceedings of the National Academy of Sciences of the United States of America. ,vol. 98, pp. 13790- 13795 ,(2001) , 10.1073/PNAS.191502998
J.-P. Brunet, P. Tamayo, T. R. Golub, J. P. Mesirov, Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 4164- 4169 ,(2004) , 10.1073/PNAS.0308531101
Liviu Badea, Clustering and Metaclustering with Nonnegative Matrix Decompositions Machine Learning: ECML 2005. pp. 10- 22 ,(2005) , 10.1007/11564096_7
L Lee, D Seung, ALGORITHMS FOR NON-NEGATIVE MATRIX FACTORIZATION ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. ,vol. 13, pp. 556- 562 ,(2001)