Pardicle: parallel approximate density-based clustering

作者: Md. Mostofa Ali Patwary , Nadathur Satish , Narayanan Sundaram , Fredrik Manne , Salman Habib

DOI: 10.1109/SC.2014.51

关键词:

摘要: Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known its ability to isolate arbitrarily-shaped clusters and filter noise data. The super-linear (O(nlogn)) computationally expensive large datasets. Given the need speed, we propose fast heuristic using density based sampling, which performs equally well in quality compared exact algorithms, but more than an order of magnitude faster. Our experiments on astrophysics synthetic massive datasets (8.5 billion numbers) shows that our approximate up 56x faster algorithms with almost identical (Omega-Index ≥ 0.99). We develop new parallel algorithm, uses dynamic partitioning improve load balancing locality. demonstrate near-linear speedup shared memory (15x 16 cores, single node Intel® Xeon® processor) distributed (3917x 4096 multinode) computers, 2x additional performance improvement Xeon Phi™ coprocessors. Additionally, existing can achieve 3.4 times partitioning.

参考文章(52)
Richard R. Muntz, Jiong Yang, Wei Wang, STING: A Statistical Information Grid Approach to Spatial Data Mining very large data bases. pp. 186- 195 ,(1997)
David G. Lowe, Marius Muja, FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION international conference on computer vision theory and applications. pp. 331- 340 ,(2009)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Ying Liu, Wei-keng Liao, A. Choudhary, Design and evaluation of a parallel HOP clustering algorithm for cosmological simulation international parallel and distributed processing symposium. pp. 82- ,(2003) , 10.1109/IPDPS.2003.1213186
Derya Birant, Alp Kut, ST-DBSCAN: An algorithm for clustering spatial-temporal data data and knowledge engineering. ,vol. 60, pp. 208- 221 ,(2007) , 10.1016/J.DATAK.2006.01.013
Boleslaw K. Szymanski, Stephen Kelley, Jierui Xie, Overlapping community detection in networks: The state-of-the-art and comparative study ACM Computing Surveys. ,vol. 45, pp. 43- ,(2013) , 10.1145/2501654.2501657
Zarija Lukić, Darren Reed, Salman Habib, Katrin Heitmann, THE STRUCTURE OF HALOS: IMPLICATIONS FOR GROUP AND CLUSTER COSMOLOGY The Astrophysical Journal. ,vol. 692, pp. 217- 228 ,(2009) , 10.1088/0004-637X/692/1/217
Robert Endre Tarjan, A class of algorithms which require nonlinear time to maintain disjoint sets Journal of Computer and System Sciences. ,vol. 18, pp. 110- 127 ,(1979) , 10.1016/0022-0000(79)90042-4
Guilherme Andrade, Gabriel Ramos, Daniel Madeira, Rafael Sachetto, Renato Ferreira, Leonardo Rocha, G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering international conference on conceptual structures. ,vol. 18, pp. 369- 378 ,(2013) , 10.1016/J.PROCS.2013.05.200