作者: Md. Mostofa Ali Patwary , Nadathur Satish , Narayanan Sundaram , Fredrik Manne , Salman Habib
DOI: 10.1109/SC.2014.51
关键词:
摘要: Dbscan is a widely used isodensity-based clustering algorithm for particle data well-known its ability to isolate arbitrarily-shaped clusters and filter noise data. The super-linear (O(nlogn)) computationally expensive large datasets. Given the need speed, we propose fast heuristic using density based sampling, which performs equally well in quality compared exact algorithms, but more than an order of magnitude faster. Our experiments on astrophysics synthetic massive datasets (8.5 billion numbers) shows that our approximate up 56x faster algorithms with almost identical (Omega-Index ≥ 0.99). We develop new parallel algorithm, uses dynamic partitioning improve load balancing locality. demonstrate near-linear speedup shared memory (15x 16 cores, single node Intel® Xeon® processor) distributed (3917x 4096 multinode) computers, 2x additional performance improvement Xeon Phi™ coprocessors. Additionally, existing can achieve 3.4 times partitioning.