KMN - Removing Noise from K-Means Clustering Results

作者: Benjamin Schelling , Claudia Plant

DOI: 10.1007/978-3-319-98539-8_11

关键词: Cluster analysisData setComputer scienceNoisePattern recognitionk-means clusteringArtificial intelligenceData point

摘要: K-Means is one of the most important data mining techniques for scientists who want to analyze their data. But has disadvantage that it unable handle noise points. This paper proposes a technique can be applied k-means Clustering result exclude We refer as KMN (short with Noise). compatible different strategies initialize and determine number clusters. Moreover, completely parameter-free. The been tested on artificial real sets demonstrate its performance in comparison other noise-excluding k-means.

参考文章(15)
Juan Mendez, Javier Lorenzo, Computing Voronoi Adjacencies in High Dimensional Spaces by Using Linear Programming Latorre Carmona P., Sánchez J., Fred A. (eds) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol 30. Springer, New York, NY. pp. 33- 49 ,(2013) , 10.1007/978-1-4614-5076-4_3
Dan Pelleg, Andrew W. Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters international conference on machine learning. pp. 727- 734 ,(2000)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
David Avis, Komei Fukuda, A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra symposium on computational geometry. ,vol. 8, pp. 98- 104 ,(1991) , 10.1145/109648.109659
Samuel Kotz, N. Balakrishnan, Norman Lloyd Johnson, Continuous univariate distributions ,(1994)
David Arthur, Sergei Vassilvitskii, k-means++: the advantages of careful seeding symposium on discrete algorithms. pp. 1027- 1035 ,(2007) , 10.5555/1283383.1283494
Mohiuddin Ahmed, Abdun Naser Mahmood, A novel approach for outlier detection and clustering improvement conference on industrial electronics and applications. pp. 577- 582 ,(2013) , 10.1109/ICIEA.2013.6566435
J. B. Macqueen, Some methods for classification and analysis of multivariate observations Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. ,vol. 1, pp. 281- 297 ,(1967)
Julien Epps, Nguyen Xuan Vinh, James Bailey, Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance Journal of Machine Learning Research. ,vol. 11, pp. 2837- 2854 ,(2010) , 10.5555/1756006.1953024
Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, Michael E. Houle, On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study Data Mining and Knowledge Discovery. ,vol. 30, pp. 891- 927 ,(2016) , 10.1007/S10618-015-0444-8