K-Distributions: A New Algorithm for Clustering Categorical Data

作者: Zhihua Cai , Dianhong Wang , Liangxiao Jiang

DOI: 10.1007/978-3-540-74205-0_48

关键词: Canopy clustering algorithmData setFSA-Red AlgorithmCURE data clustering algorithmCategorical variablek-means clusteringData stream clusteringk-medians clusteringAlgorithmArtificial intelligenceCluster analysisData miningPattern recognitionComputer science

摘要: Clustering is one of the most important tasks in data mining. The K-means algorithm popular for achieving this task because its efficiency. However, it works only on numeric values although sets mining often contain categorical values. Responding to fact, K-modes presented extend domains. Unfortunately, suffers from computing dissimilarity between each pair objects and mode cluster. Aiming at addressing these problems confronting K-modes, we present a new called K-distributions paper. We experimentally tested using well known 36 UCI selected by Weka, compared K-modes. experimental results show that significantly outperforms term clustering accuracy log likelihood.

参考文章(12)
Russ Greiner, Yuhong Guo, Discriminative model selection for belief net structures national conference on artificial intelligence. pp. 770- 776 ,(2005) , 10.7939/R3610VR65
Nir Friedman, Dan Geiger, Moises Goldszmidt, Bayesian Network Classifiers Machine Learning. ,vol. 29, pp. 131- 163 ,(1997) , 10.1023/A:1007465528199
A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review ACM Computing Surveys. ,vol. 31, pp. 264- 323 ,(1999) , 10.1145/331499.331504
C. L. Blake, UCI Repository of machine learning databases www.ics.uci.edu/〜mlearn/MLRepository.html. ,(1998)
Claude Nadeau, Yoshua Bengio, Inference for the Generalization Error neural information processing systems. ,vol. 52, pp. 307- 313 ,(1999) , 10.1023/A:1024068626366
J. B. Macqueen, Some methods for classification and analysis of multivariate observations Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. ,vol. 1, pp. 281- 297 ,(1967)
Daniel Grossman, Pedro Domingos, Learning Bayesian network classifiers by maximizing conditional likelihood international conference on machine learning. pp. 46- ,(2004) , 10.1145/1015330.1015339
Charles Sutton, Khashayar Rohanimanesh, Andrew McCallum, Dynamic conditional random fields Twenty-first international conference on Machine learning - ICML '04. pp. 99- ,(2004) , 10.1145/1015330.1015422
Ian Witten, Data Mining ,(2008)