The analysis of a simple k-means clustering algorithm

作者: Tapas Kanungo , David M. Mount , Nathan S. Netanyahu , Christine Piatko , Ruth Silverman

DOI: 10.1145/336154.336189

关键词:

摘要: Abstract : K-means clustering is a very popular technique which used in numerous applications. Given set of n data points R(exp d) and an integer k, the problem to determine k d), called centers, so as minimize mean squared distance from each point its nearest center. A heuristic for k-means Lloyd's algorithm. In this paper, we present simple efficient implementation algorithm, call filtering This algorithm easy implement. It differs most other approaches that it precomputes kd-tree structure rather than center points. We establish practical efficiency two ways. First, data-sensitive analysis algorithm's running time. Second, have implemented performed number empirical studies, both on synthetically generated real applications color quantization, compression, segmentation.

参考文章(35)
Stavros G. Kolliopoulos, Satish Rao, A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem european symposium on algorithms. pp. 378- 389 ,(1999) , 10.1007/3-540-48481-7_33
Allen Gersho, Robert M. Gray, Vector Quantization and Signal Compression ,(1991)
Songrit Maneewongvatana, David M. Mount, Analysis of Approximate Nearest Neighbor Searching with Clustered Point Sets Data Structures, Near Neighbor Searches, and Methodology. pp. 105- 123 ,(1999)
Keinosuke Fukunaga, Introduction to statistical pattern recognition (2nd ed.) Academic Press Professional, Inc.. ,(1990)
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Christine Piatko, Tapas Kanungo, Ruth Silverman, Nathan S. Netanyahu, David M. Mount, Angela Y. Wu, Computing nearest neighbors for moving points and applications to clustering symposium on discrete algorithms. pp. 931- 932 ,(1999)
Sanjeev Arora, Prabhakar Raghavan, Satish Rao, Approximation schemes for Euclidean k-medians and related problems symposium on the theory of computing. pp. 106- 113 ,(1998) , 10.1145/276698.276718
Pravin M. Vaidya, AnO(n logn) algorithm for the all-nearest-neighbors Problem Discrete & Computational Geometry. ,vol. 4, pp. 101- 115 ,(1989) , 10.1007/BF02187718