作者: Chen Lin , Junzhong Gu
DOI: 10.14257/IJDTA.2016.9.6.08
关键词: Load balancing (computing) 、 Hash function 、 Association rule learning 、 Data mining algorithm 、 Resource use 、 Scalability 、 Cluster analysis 、 Computer science 、 Data structure 、 Data mining
摘要: Frequent Itemset Mining (FIM) is one of most fundamental techniques in data mining with extensive applications to a variety problems such as association rule mining, correlations, clustering and classification. Since the first proposal frequent itemset numerous serial algorithms have been proposed order improve performance, yet them cannot scale massive datasets which are very common nowadays. In this paper, we propose new parallel FIM algorithm named PFIN based on Nodeset more efficient structure for itemsets. can intelligently decompose large-scale problem into set tasks, where each task be executed without unnecessary communication overheads. Moreover, hash-based load balancing strategy has adopted optimize resource use maximize throughput. For evaluating performance PFIN, conduct experiments Spark an emerging distributed in-memory processing framework compare it against PFP state-of-the-art range real datasets. The experimental results demonstrate that our highly competitive scalability outperforming speed performance.