作者: Doina Tilivea , Liviu Badea
DOI:
关键词:
摘要: Although clustering is probably the most frequently used tool for data mining gene expression data, existing approaches face at least one of following problems in this domain: a huge number variables (genes) as compared to samples, high noise levels, inability naturally deal with overlapping clusters, instability resulting clusters w.r.t. initialization algorithm well difficulty genes and samples simultaneously. In paper we show that all these can be elegantly dealt by using nonnegative matrix factorizations cluster simultaneously while allowing bicluster overlaps employing Positive Tensor Factorization perform two-way meta-clustering biclusters produced several different runs (thereby addressing above-mentioned instability). The application our approach large lung cancer dataset proved computationally tractable was able recover histological classification various subtypes represented dataset.