Bayesian Cluster Analysis: Point Estimation and Credible Balls

作者: Zoubin Ghahramani , Sara Wade

DOI: 10.1214/17-BA1073

关键词: Maximum a posteriori estimationData miningVariation of informationCluster analysisCredible intervalBayesian probabilityMixture modelDirected acyclic graphPoint estimationMathematics

摘要: Clustering is widely studied in statistics and machine learning, with applications a variety of fields. As opposed to popular algorithms such as agglomerative hierarchical clustering or k-means which return single solution, Bayesian nonparametric models provide posterior over the entire space partitions, allowing one assess statistical properties, uncertainty on number clusters. However, an important problem how summarize posterior; huge dimension partition difficulties visualizing it add this problem. In analysis, real-valued parameter interest often summarized by reporting point estimate mean along 95% credible intervals characterize uncertainty. paper, we extend these ideas develop appropriate estimates sets structure based decision information theoretic techniques.

参考文章(32)
Jim Pitman, Poisson-Kingman partitions arXiv: Probability. pp. 1- 34 ,(2003) , 10.1214/LNMS/1215091133
David B. Dahl, Marina Vannucci, Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model Cambridge University Press. pp. 201- 218 ,(2006) , 10.1017/CBO9780511584589.011
David B. Dunson, Nonparametric Bayes applications to biostatistics Bayesian Nonparametrics. pp. 223- 273 ,(2010) , 10.1017/CBO9780511802478.008
J. E Griffin, M. F. J Steel, Order-Based Dependent Dirichlet Processes Journal of the American Statistical Association. ,vol. 101, pp. 179- 194 ,(2006) , 10.1198/016214505000000727
Fernando A. Quintana, Pilar L. Iglesias, Bayesian clustering and product partition models Journal of The Royal Statistical Society Series B-statistical Methodology. ,vol. 65, pp. 557- 574 ,(2003) , 10.1111/1467-9868.00402
J. A. Hartigan, M. A. Wong, A K-Means Clustering Algorithm Journal of The Royal Statistical Society Series C-applied Statistics. ,vol. 28, pp. 100- 108 ,(1979) , 10.2307/2346830
David B. Dahl, Modal clustering in a class of product partition models Bayesian Analysis. ,vol. 4, pp. 243- 264 ,(2009) , 10.1214/09-BA409
D. A. BINDER, Bayesian cluster analysis Biometrika. ,vol. 65, pp. 31- 38 ,(1978) , 10.1093/BIOMET/65.1.31
J. A. Duan, M. Guindani, A. E. Gelfand, Generalized spatial dirichlet process models Biometrika. ,vol. 94, pp. 809- 825 ,(2007) , 10.1093/BIOMET/ASM071
William M. Rand, Objective Criteria for the Evaluation of Clustering Methods Journal of the American Statistical Association. ,vol. 66, pp. 846- 850 ,(1971) , 10.1080/01621459.1971.10482356