PFIN: A Parallel Frequent Itemset Mining Algorithm Using Nodesets

作者： Chen Lin , Junzhong Gu

DOI: 10.14257/IJDTA.2016.9.6.08

关键词: Load balancing (computing) 、 Hash function 、 Association rule learning 、 Data mining algorithm 、 Resource use 、 Scalability 、 Cluster analysis 、 Computer science 、 Data structure 、 Data mining

摘要: Frequent Itemset Mining (FIM) is one of most fundamental techniques in data mining with extensive applications to a variety problems such as association rule mining, correlations, clustering and classification. Since the first proposal frequent itemset numerous serial algorithms have been proposed order improve performance, yet them cannot scale massive datasets which are very common nowadays. In this paper, we propose new parallel FIM algorithm named PFIN based on Nodeset more efficient structure for itemsets. can intelligently decompose large-scale problem into set tasks, where each task be executed without unnecessary communication overheads. Moreover, hash-based load balancing strategy has adopted optimize resource use maximize throughput. For evaluating performance PFIN, conduct experiments Spark an emerging distributed in-memory processing framework compare it against PFP state-of-the-art range real datasets. The experimental results demonstrate that our highly competitive scalability outperforming speed performance.

sersc.org PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(17)

David C. Anastasiu, Jeremy Iverson, Shaden Smith, George Karypis, Big Data Frequent Pattern Mining Frequent Pattern Mining. pp. 225- 259 ,(2014) , 10.1007/978-3-319-07821-2_10

Iko Pramudiono, Masaru Kitsuregawa, Parallel FP-growth on PC cluster knowledge discovery and data mining. pp. 467- 473 ,(2003) , 10.5555/1760894.1760956

Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)

Ron Rymon, Search through systematic set enumeration principles of knowledge representation and reasoning. pp. 539- 550 ,(1992)

Hannu Toivonen, Sampling Large Databases for Association Rules very large data bases. pp. 134- 145 ,(1996)

Ming-Yen Lin, Pei-Yu Lee, Sue-Chen Hsueh, Apriori-based frequent itemset mining algorithms on MapReduce Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC '12. pp. 76- ,(2012) , 10.1145/2184751.2184842

Jong Soo Park, Ming-Syan Chen, Philip S. Yu, An effective hash-based algorithm for mining association rules international conference on management of data. ,vol. 24, pp. 175- 186 ,(1995) , 10.1145/223784.223813

Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, Edward Y. Chang, Pfp Proceedings of the 2008 ACM conference on Recommender systems - RecSys '08. pp. 107- 114 ,(2008) , 10.1145/1454008.1454027

R. Agrawal, J.C. Shafer, Parallel mining of association rules IEEE Transactions on Knowledge and Data Engineering. ,vol. 8, pp. 962- 969 ,(1996) , 10.1109/69.553164

10.

Hongjian Qiu, Rong Gu, Chunfeng Yuan, Yihua Huang, YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark international parallel and distributed processing symposium. pp. 1664- 1671 ,(2014) , 10.1109/IPDPSW.2014.185

PFIN: A Parallel Frequent Itemset Mining Algorithm Using Nodesets

来源期刊

我的账户

PFIN: A Parallel Frequent Itemset Mining Algorithm Using Nodesets

来源期刊

相似文章 2

Multi-level dataset decomposition for parallel frequent itemset mining on a cluster of personal computers

FPO tree and DP3 algorithm for distributed parallel frequent itemsets mining

我的账户