GAD: General Activity Detection for Fast Clustering on Large Data.

作者: Jiawei Han , Zhijun Yin , Sangkyum Kim , Liangliang Cao , Xin Jin

DOI:

关键词: Activity detectionLarge scale dataCluster analysisExploitSet (abstract data type)Computer scienceScale (descriptive set theory)Data miningGeneral activity

摘要: In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within framework design a set of algorithms different scenarios: (1) Exact algorithm E-GAD, which is much faster than K-Means and gets the same result. (2) Approximate with assumptions, are E-GAD while achieving degrees approximation. (3) based to handle ”large clusters” problem appears in many applications. Two existing activity detection GT CGAUTC special cases under framework. The most important contribution our work that general solution exploit both exact approximate senarios, proposed within can achieve very high speed. Extensive experiments have been conducted several datasets from various real world applications; results show effective efficient.

参考文章(26)
Dan Pelleg, Andrew Moore, Accelerating exact k-means algorithms with geometric reasoning knowledge discovery and data mining. pp. 277- 281 ,(1999) , 10.1145/312129.312248
S.-W. Ra, J.-K. Kim, A fast mean-distance-ordered partial codebook search algorithm for image vector quantization IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing. ,vol. 40, pp. 576- 579 ,(1993) , 10.1109/82.257335
D. Frossyniotis, A. Likas, A. Stafylopatis, A clustering method based on boosting Pattern Recognition Letters. ,vol. 25, pp. 641- 654 ,(2004) , 10.1016/J.PATREC.2003.12.018
Jim Z.C. Lai, Yi-Ching Liaw, Julie Liu, A fast VQ codebook generation algorithm using codeword displacement Pattern Recognition. ,vol. 41, pp. 315- 319 ,(2008) , 10.1016/J.PATCOG.2007.04.015
Sin-Horng Chen, W.M. Hsieh, FAST ALGORITHM FOR VQ CODEBOOK DESIGN IEE Proceedings I Communications, Speech and Vision. ,vol. 138, pp. 357- 362 ,(1991) , 10.1049/IP-I-2.1991.0048
Chang-Da Bei, R. Gray, An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization IEEE Transactions on Communications. ,vol. 33, pp. 1132- 1133 ,(1985) , 10.1109/TCOM.1985.1096214
Tian Zhang, Raghu Ramakrishnan, Miron Livny, BIRCH: an efficient data clustering method for very large databases international conference on management of data. ,vol. 25, pp. 103- 114 ,(1996) , 10.1145/233269.233324
Alexander Strehl, Joydeep Ghosh, Cluster ensembles --- a knowledge reuse framework for combining multiple partitions Journal of Machine Learning Research. ,vol. 3, pp. 583- 617 ,(2003) , 10.1162/153244303321897735
S. Dudoit, J. Fridlyand, Bagging to improve the accuracy of a clustering procedure Bioinformatics. ,vol. 19, pp. 1090- 1099 ,(2003) , 10.1093/BIOINFORMATICS/BTG038
H.G. Ayad, M.S. Kamel, Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 30, pp. 160- 173 ,(2008) , 10.1109/TPAMI.2007.1138