Co-clustering algorithms for distributional data with automated variable weighting

作者: Francisco de AT De Carvalho , Antonio Balzanella , Antonio Irpino , Rosanna Verde , None

DOI: 10.1016/J.INS.2020.11.018

关键词: Variable (mathematics)Row and column spacesBiclusteringRelevance (information retrieval)Set (abstract data type)HistogramAlgorithmMathematicsStructure (category theory)Weighting

摘要: Abstract This paper is concerned with the co-clustering of distribution-valued data, that is, simultaneous partitioning rows and columns an input data table, elements which are distributions (or histograms) representing aggregate data. The first proposed method extends double k-means algorithm to distributional L 2 Wasserstein distance, also known as Mallow’s used compare distributions. To consider different relevance variables characterizing clusters, four variants adaptive proposed. Accordingly, in procedure, additional step introduced compute weights associated variables. In particular, each algorithms provides i) a set for variables; ii) sets variables, one cluster (cluster-wise); iii) according decomposition distance into two components; iv) components, (cluster-wise). Applications using simulated real demonstrate effectiveness contribution procedure structure

参考文章(38)
Simona Korenjak-Černe, Vladimir Batagelj, Symbolic Data Analysis Approach to Clustering Large Datasets Classification, Clustering, and Data Analysis. ,vol. 800, pp. 319- 327 ,(2002) , 10.1007/978-3-642-56181-8_35
Giuseppe Giordano, Paula Brito, Social Networks as Symbolic Data Springer, Cham. pp. 133- 141 ,(2014) , 10.1007/978-3-319-06692-9_15
Sung-Hyuk Cha, Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions International Journal of Mathematical Models and Methods in Applied Sciences. ,vol. 1, ,(2007)
M. Vrac, L. Billard, E. Diday, A. Chédin, Copula analysis of mixture models Computational Statistics. ,vol. 27, pp. 427- 457 ,(2012) , 10.1007/S00180-011-0266-0
Patrick Doreian, Vladimir Batagelj, Anuška Ferligoj, Generalized blockmodeling of two-mode network data Social Networks. ,vol. 26, pp. 29- 53 ,(2004) , 10.1016/J.SOCNET.2004.01.002
Roberto Rocci, Maurizio Vichi, Two-mode multi-partitioning Computational Statistics & Data Analysis. ,vol. 52, pp. 1984- 2003 ,(2008) , 10.1016/J.CSDA.2007.06.025
Francisco de AT De Carvalho, Yves Lechevallier, None, Partitional clustering algorithms for symbolic interval data based on single adaptive distances Pattern Recognition. ,vol. 42, pp. 1223- 1236 ,(2009) , 10.1016/J.PATCOG.2008.11.016
Alison L. Gibbs, Francis Edward Su, On Choosing and Bounding Probability Metrics International Statistical Review. ,vol. 70, pp. 419- 435 ,(2002) , 10.1111/J.1751-5823.2002.TB00178.X
Jaejik Kim, L. Billard, Dissimilarity Measures for Histogram-valued Observations Communications in Statistics-theory and Methods. ,vol. 42, pp. 283- 303 ,(2013) , 10.1080/03610926.2011.581785