作者: Zhihua Cai , Dianhong Wang , Liangxiao Jiang
DOI: 10.1007/978-3-540-74205-0_48
关键词: Canopy clustering algorithm 、 Data set 、 FSA-Red Algorithm 、 CURE data clustering algorithm 、 Categorical variable 、 k-means clustering 、 Data stream clustering 、 k-medians clustering 、 Algorithm 、 Artificial intelligence 、 Cluster analysis 、 Data mining 、 Pattern recognition 、 Computer science
摘要: Clustering is one of the most important tasks in data mining. The K-means algorithm popular for achieving this task because its efficiency. However, it works only on numeric values although sets mining often contain categorical values. Responding to fact, K-modes presented extend domains. Unfortunately, suffers from computing dissimilarity between each pair objects and mode cluster. Aiming at addressing these problems confronting K-modes, we present a new called K-distributions paper. We experimentally tested using well known 36 UCI selected by Weka, compared K-modes. experimental results show that significantly outperforms term clustering accuracy log likelihood.