Nonparametric Bayesian Biclustering

作者: Edward Meeds , Sam Roweis

DOI:

关键词:

摘要: We present a probabilistic block-constant biclustering model that simultaneously clusters rows and columns of data matrix. All entries with the same row cluster column form bicluster. Each is part mixture having nonparametric Bayesian prior. The number biclusters therefore treated as nuisance para meter implicitly integrated over during simulation. Missing are completely out model, allowing us to bipass common requirement for algorithms missing values be filled before analysis, but also makes it robust h igh rates values. By using Gaussian density in bliclusters, an efficient sampling algorithm produced because bicluster parameters analytically out. several inference procedures indicat ors, including Gibbs split-merge moves. show our method competitive, if not superior, existing imputation methods, especially high rates, despite imputing co nstant entire blocks data. experiments exploratory results.

参考文章(10)
Robert Tibshirani, Trevor Hastie, Gavin Sherlock, David Botstein, Patrick Brown, Michael Eisen, Imputing Missing Data for Gene Expression Arrays ,(2001)
George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)
Thomas S. Ferguson, A Bayesian Analysis of Some Nonparametric Problems Annals of Statistics. ,vol. 1, pp. 209- 230 ,(1973) , 10.1214/AOS/1176342360
Radford M. Neal, Markov Chain Sampling Methods for Dirichlet Process Mixture Models Journal of Computational and Graphical Statistics. ,vol. 9, pp. 249- 265 ,(2000) , 10.1080/10618600.2000.10474879
Troyanskaya Olga, Cantor Michael, Shelock Gavin, Brown Pat, Hastie Trevor, Tibshirani Robert, Botstein David, None, Missing value estimation methods for DNA microarrays. Bioinformatics. ,vol. 17, pp. 520- 525 ,(2001) , 10.1093/BIOINFORMATICS/17.6.520
Trond Hellem Bø, Bjarte Dysvik, Inge Jonassen, LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Research. ,vol. 32, ,(2004) , 10.1093/NAR/GNH026
Sonia Jain, Radford M Neal, A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model Journal of Computational and Graphical Statistics. ,vol. 13, pp. 158- 182 ,(2004) , 10.1198/1061860043001
S. Oba, M.-a. Sato, I. Takemasa, M. Monden, K.-i. Matsubara, S. Ishii, A Bayesian missing value estimation method for gene expression profile data Bioinformatics. ,vol. 19, pp. 2088- 2096 ,(2003) , 10.1093/BIOINFORMATICS/BTG287
Inmar Givoni, Brendan J. Frey, Vincent Cheung, Matrix tile analysis uncertainty in artificial intelligence. pp. 200- 207 ,(2006)