GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs

作者: Andrew Adinetz , Jiri Kraus , Jan Meinke , Dirk Pleiter

DOI: 10.1007/978-3-642-40047-6_83

关键词:

摘要: Clustering, i.e., the identification of regions similar objects in a multi-dimensional data set, is standard method analytics with large variety applications. For high-dimensional data, subspace clustering can be used to find clusters among certain subset point dimensions and alleviate curse dimensionality. In this paper we focus on MAFIA algorithm using GPUs accelerate algorithm. We first present number algorithmic changes estimate their effect computational complexity These improve sequential version by 1---2 orders magnitude practical datasets while providing exactly same output. then GPU algorithm, which for typical provides further speedup over single CPU core or about an order multi-core CPU. believe that our faster implementation widens applicability clustering.

参考文章(26)
Harsha Nagesh, Sanjay Goil, Alok Choudhary, Parallel Algorithms for Clustering High-Dimensional Large-Scale Datasets Springer, Boston, MA. pp. 335- 356 ,(2001) , 10.1007/978-1-4615-1733-7_19
Huan Liu, Ehtesham Haque, Lance Parsons, Evaluating Subspace Clustering Algorithms ,(2004)
Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger, Ina Müller-Gorman, Arthur Zimek, Detection and Visualization of Subspace Cluster Hierarchies Advances in Databases: Concepts, Systems and Applications. pp. 152- 163 ,(2007) , 10.1007/978-3-540-71703-4_15
Lisha Ma, Stratis D. Viglas, Meng Li, Qian Li, Stream Operators for Querying Data Streams Advances in Web-Age Information Management. pp. 404- 415 ,(2005) , 10.1007/11563952_36
Karin Kailing, Hans-Peter Kriegel, Peer Kroger, Density-Connected Subspace Clustering for High-Dimensional Data siam international conference on data mining. pp. 246- 256 ,(2004)
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314
I Chiosa, A Kolb, GPU-Based Multilevel Clustering IEEE Transactions on Visualization and Computer Graphics. ,vol. 17, pp. 132- 145 ,(2011) , 10.1109/TVCG.2010.55
Christian Böhm, Robert Noll, Claudia Plant, Bianca Wackersreuther, Density-based clustering using graphics processors Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09. pp. 661- 670 ,(2009) , 10.1145/1645953.1646038
Ren Wu, Bin Zhang, Meichun Hsu, Clustering billions of data points using GPUs unconventional high performance computing. pp. 1- 6 ,(2009) , 10.1145/1531666.1531668
D.T. Anderson, R.H. Luke, J.M. Keller, Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units IEEE Transactions on Fuzzy Systems. ,vol. 16, pp. 1101- 1106 ,(2008) , 10.1109/TFUZZ.2008.924203