KMN - Removing Noise from K-Means Clustering Results

作者： Benjamin Schelling , Claudia Plant

关键词: Cluster analysis 、 Data set 、 Computer science 、 Noise 、 Pattern recognition 、 k-means clustering 、 Artificial intelligence 、 Data point

摘要: K-Means is one of the most important data mining techniques for scientists who want to analyze their data. But has disadvantage that it unable handle noise points. This paper proposes a technique can be applied k-means Clustering result exclude We refer as KMN (short with Noise). compatible different strategies initialize and determine number clusters. Moreover, completely parameter-free. The been tested on artificial real sets demonstrate its performance in comparison other noise-excluding k-means.

参考文章(15)

Juan Mendez, Javier Lorenzo, Computing Voronoi Adjacencies in High Dimensional Spaces by Using Linear Programming Latorre Carmona P., Sánchez J., Fred A. (eds) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol 30. Springer, New York, NY. pp. 33- 49 ,(2013) , 10.1007/978-1-4614-5076-4_3

Dan Pelleg, Andrew W. Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters international conference on machine learning. pp. 727- 734 ,(2000)

Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)

David Avis, Komei Fukuda, A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra symposium on computational geometry. ,vol. 8, pp. 98- 104 ,(1991) , 10.1145/109648.109659

Samuel Kotz, N. Balakrishnan, Norman Lloyd Johnson, Continuous univariate distributions ,(1994)

David Arthur, Sergei Vassilvitskii, k-means++: the advantages of careful seeding symposium on discrete algorithms. pp. 1027- 1035 ,(2007) , 10.5555/1283383.1283494

Mohiuddin Ahmed, Abdun Naser Mahmood, A novel approach for outlier detection and clustering improvement conference on industrial electronics and applications. pp. 577- 582 ,(2013) , 10.1109/ICIEA.2013.6566435

J. B. Macqueen, Some methods for classification and analysis of multivariate observations Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. ,vol. 1, pp. 281- 297 ,(1967)

Julien Epps, Nguyen Xuan Vinh, James Bailey, Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance Journal of Machine Learning Research. ,vol. 11, pp. 2837- 2854 ,(2010) , 10.5555/1756006.1953024

10.

Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, Michael E. Houle, On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study Data Mining and Knowledge Discovery. ,vol. 30, pp. 891- 927 ,(2016) , 10.1007/S10618-015-0444-8

KMN - Removing Noise from K-Means Clustering Results

来源期刊

我的账户

KMN - Removing Noise from K-Means Clustering Results

来源期刊

相似文章 2

The Data Mining Group at University of Vienna

Multiple-Perspective Clustering of Passive Wi-Fi Sensing Trajectory Data

我的账户