K-Distributions: A New Algorithm for Clustering Categorical Data

作者： Zhihua Cai , Dianhong Wang , Liangxiao Jiang

关键词: Canopy clustering algorithm 、 Data set 、 FSA-Red Algorithm 、 CURE data clustering algorithm 、 Categorical variable 、 k-means clustering 、 Data stream clustering 、 k-medians clustering 、 Algorithm 、 Artificial intelligence 、 Cluster analysis 、 Data mining 、 Pattern recognition 、 Computer science

摘要: Clustering is one of the most important tasks in data mining. The K-means algorithm popular for achieving this task because its efficiency. However, it works only on numeric values although sets mining often contain categorical values. Responding to fact, K-modes presented extend domains. Unfortunately, suffers from computing dissimilarity between each pair objects and mode cluster. Aiming at addressing these problems confronting K-modes, we present a new called K-distributions paper. We experimentally tested using well known 36 UCI selected by Weka, compared K-modes. experimental results show that significantly outperforms term clustering accuracy log likelihood.

参考文章(12)

Russ Greiner, Yuhong Guo, Discriminative model selection for belief net structures national conference on artificial intelligence. pp. 770- 776 ,(2005) , 10.7939/R3610VR65

Nir Friedman, Dan Geiger, Moises Goldszmidt, Bayesian Network Classifiers Machine Learning. ,vol. 29, pp. 131- 163 ,(1997) , 10.1023/A:1007465528199

A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review ACM Computing Surveys. ,vol. 31, pp. 264- 323 ,(1999) , 10.1145/331499.331504

C. L. Blake, UCI Repository of machine learning databases www.ics.uci.edu/〜mlearn/MLRepository.html. ,(1998)

Claude Nadeau, Yoshua Bengio, Inference for the Generalization Error neural information processing systems. ,vol. 52, pp. 307- 313 ,(1999) , 10.1023/A:1024068626366

J. B. Macqueen, Some methods for classification and analysis of multivariate observations Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. ,vol. 1, pp. 281- 297 ,(1967)

Daniel Grossman, Pedro Domingos, Learning Bayesian network classifiers by maximizing conditional likelihood international conference on machine learning. pp. 46- ,(2004) , 10.1145/1015330.1015339

Charles Sutton, Khashayar Rohanimanesh, Andrew McCallum, Dynamic conditional random fields Twenty-first international conference on Machine learning - ICML '04. pp. 99- ,(2004) , 10.1145/1015330.1015422

Zhexue Huang, A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. DMKD. pp. 0- ,(1997)

10.

Ian Witten, Data Mining ,(2008)

K-Distributions: A New Algorithm for Clustering Categorical Data

来源期刊

我的账户

K-Distributions: A New Algorithm for Clustering Categorical Data

来源期刊

相似文章 8

我的账户