作者: Francisco de AT De Carvalho , Antonio Balzanella , Antonio Irpino , Rosanna Verde , None
DOI: 10.1016/J.INS.2020.11.018
关键词: Variable (mathematics) 、 Row and column spaces 、 Biclustering 、 Relevance (information retrieval) 、 Set (abstract data type) 、 Histogram 、 Algorithm 、 Mathematics 、 Structure (category theory) 、 Weighting
摘要: Abstract This paper is concerned with the co-clustering of distribution-valued data, that is, simultaneous partitioning rows and columns an input data table, elements which are distributions (or histograms) representing aggregate data. The first proposed method extends double k-means algorithm to distributional L 2 Wasserstein distance, also known as Mallow’s used compare distributions. To consider different relevance variables characterizing clusters, four variants adaptive proposed. Accordingly, in procedure, additional step introduced compute weights associated variables. In particular, each algorithms provides i) a set for variables; ii) sets variables, one cluster (cluster-wise); iii) according decomposition distance into two components; iv) components, (cluster-wise). Applications using simulated real demonstrate effectiveness contribution procedure structure