PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce

作者: Weizhong Zhao , V. Martha , Xiaowei Xu

DOI: 10.1109/AINA.2013.47

关键词:

摘要: Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce a programming model that empowers us analyze big in cluster computers. In this paper we propose Parallel Structural Clustering Algorithm for Networks (PSCAN) the detection clusters or community structures Twitter. PSCAN based on structural clustering algorithm SCAN, which not only finds accurately, but also identifies playing special roles hubs outliers. An empirical evaluation both real synthetic demonstrated an outstanding performance terms accuracy running time. We analyzed Twitter network 40 million users 1.4 billion follower/following relationships by Hadoop 15 The result shows successfully detected interesting communities people who share common interests.

参考文章(14)
Joydeep Ghosh, Raymond Mooney, Alexander Strehl, Impact of Similarity Measures on Web-page Clustering ,(2000)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Andrea Lancichinetti, Santo Fortunato, Filippo Radicchi, Benchmark graphs for testing community detection algorithms Physical Review E. ,vol. 78, pp. 046110- ,(2008) , 10.1103/PHYSREVE.78.046110
Bin Wu, YaHong Du, Cloud-based Connected Component Algorithm 2010 International Conference on Artificial Intelligence and Computational Intelligence. ,vol. 3, pp. 122- 126 ,(2010) , 10.1109/AICI.2010.360
Aaron Clauset, M. E. J. Newman, Cristopher Moore, Finding community structure in very large networks. Physical Review E. ,vol. 70, pp. 066111- ,(2004) , 10.1103/PHYSREVE.70.066111
M. E. J. Newman, M. Girvan, Finding and evaluating community structure in networks. Physical Review E. ,vol. 69, pp. 026113- 026113 ,(2004) , 10.1103/PHYSREVE.69.026113
Haewoon Kwak, Changhyun Lee, Hosung Park, Sue Moon, None, What is Twitter, a social network or a news media? the web conference. pp. 591- 600 ,(2010) , 10.1145/1772690.1772751
Jianbo Shi, J. Malik, Normalized cuts and image segmentation IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 22, pp. 888- 905 ,(2000) , 10.1109/34.868688
Usha Nandini Raghavan, Réka Albert, Soundar Kumara, Near linear time algorithm to detect community structures in large-scale networks. Physical Review E. ,vol. 76, pp. 036106- 036106 ,(2007) , 10.1103/PHYSREVE.76.036106
Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, Thomas A. J. Schweiger, SCAN Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. pp. 824- 833 ,(2007) , 10.1145/1281192.1281280