作者: Xiufen Fu , Yaguang Wang , Yanna Ge , Peiwen Chen , Shaohua Teng
DOI: 10.1007/978-3-319-09265-2_9
关键词:
摘要: Along with the rapid development of information age, more and data can be obtained from Internet, it is very difficult to get useful knowledge these huge amounts data. On foundation existing algorithm based on DBSCAN, a new improved incremental DBSCAN clustering proposed. Combining cloud computing open source framework Hadoop, use programming model MapReduce which easy write distributed applications simplify programme divide elements into chunks distribute across cluster run as job, in this way, mining integrated Hadoop by algorithm. When manipulation (add or delete) has occurred database, what we need do mine mutative merge similar clusters, ultimately form final mining.Compared single node server serial arithmetic overall mining, time delay processing will reduced. In last part,the paper verified effectiveness experiments analysis.