Research and Application of DBSCAN Algorithm Based on Hadoop Platform

作者: Xiufen Fu , Yaguang Wang , Yanna Ge , Peiwen Chen , Shaohua Teng

DOI: 10.1007/978-3-319-09265-2_9

关键词:

摘要: Along with the rapid development of information age, more and data can be obtained from Internet, it is very difficult to get useful knowledge these huge amounts data. On foundation existing algorithm based on DBSCAN, a new improved incremental DBSCAN clustering proposed. Combining cloud computing open source framework Hadoop, use programming model MapReduce which easy write distributed applications simplify programme divide elements into chunks distribute across cluster run as job, in this way, mining integrated Hadoop by algorithm. When manipulation (add or delete) has occurred database, what we need do mine mutative merge similar clusters, ultimately form final mining.Compared single node server serial arithmetic overall mining, time delay processing will reduced. In last part,the paper verified effectiveness experiments analysis.

参考文章(9)
Guanghui Xu, Feng Xu, Hongxu Ma, Deploying and researching Hadoop in virtual machines 2012 IEEE International Conference on Automation and Logistics. pp. 395- 399 ,(2012) , 10.1109/ICAL.2012.6308241
Shiori Kurazumi, Tomoaki Tsumura, Shoichi Saito, Hiroshi Matsuo, Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce international conference on networking and computing. pp. 288- 292 ,(2012) , 10.1109/ICNC.2012.53
Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng, Jianping Fan, MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce international conference on parallel and distributed systems. pp. 473- 480 ,(2011) , 10.1109/ICPADS.2011.83
Matei Zaharia, Ariel Rabkin, Michael Armbrust, David A. Patterson, Andrew Konwinski, Anthony D. Joseph, Gunho Lee, Ion Stoica, Randy H. Katz, Armando Fox, Rean Griffith, Above the Clouds: A Berkeley View of Cloud Computing Science. ,vol. 53, pp. 07- 013 ,(2009)
C.C. Aggarwal, P.S. Yu, A Survey of Uncertain Data Algorithms and Applications IEEE Transactions on Knowledge and Data Engineering. ,vol. 21, pp. 609- 623 ,(2009) , 10.1109/TKDE.2008.190
Jeffrey Dean, Sanjay Ghemawat, MapReduce Communications of the ACM. ,vol. 51, pp. 107- 113 ,(2008) , 10.1145/1327452.1327492
Yuan Jin-sheng, Text Clustering Based on Improved DBSCAN Algorithm Computer Engineering. ,(2011)
Liu Wen, Study of Chameleon Clustering Algorithm and Implementation in Weka Computer Systems and Applications. ,(2010)
Wang Jiandong, Zhai Zhigang, Secure Model of Distributed Database Based on UCON Computer Engineering. ,vol. 37, pp. 50- 51 ,(2011)