PFIN: A Parallel Frequent Itemset Mining Algorithm Using Nodesets

作者: Chen Lin , Junzhong Gu

DOI: 10.14257/IJDTA.2016.9.6.08

关键词: Load balancing (computing)Hash functionAssociation rule learningData mining algorithmResource useScalabilityCluster analysisComputer scienceData structureData mining

摘要: Frequent Itemset Mining (FIM) is one of most fundamental techniques in data mining with extensive applications to a variety problems such as association rule mining, correlations, clustering and classification. Since the first proposal frequent itemset numerous serial algorithms have been proposed order improve performance, yet them cannot scale massive datasets which are very common nowadays. In this paper, we propose new parallel FIM algorithm named PFIN based on Nodeset more efficient structure for itemsets. can intelligently decompose large-scale problem into set tasks, where each task be executed without unnecessary communication overheads. Moreover, hash-based load balancing strategy has adopted optimize resource use maximize throughput. For evaluating performance PFIN, conduct experiments Spark an emerging distributed in-memory processing framework compare it against PFP state-of-the-art range real datasets. The experimental results demonstrate that our highly competitive scalability outperforming speed performance.

参考文章(17)
David C. Anastasiu, Jeremy Iverson, Shaden Smith, George Karypis, Big Data Frequent Pattern Mining Frequent Pattern Mining. pp. 225- 259 ,(2014) , 10.1007/978-3-319-07821-2_10
Iko Pramudiono, Masaru Kitsuregawa, Parallel FP-growth on PC cluster knowledge discovery and data mining. pp. 467- 473 ,(2003) , 10.5555/1760894.1760956
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Ron Rymon, Search through systematic set enumeration principles of knowledge representation and reasoning. pp. 539- 550 ,(1992)
Hannu Toivonen, Sampling Large Databases for Association Rules very large data bases. pp. 134- 145 ,(1996)
Ming-Yen Lin, Pei-Yu Lee, Sue-Chen Hsueh, Apriori-based frequent itemset mining algorithms on MapReduce Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC '12. pp. 76- ,(2012) , 10.1145/2184751.2184842
Jong Soo Park, Ming-Syan Chen, Philip S. Yu, An effective hash-based algorithm for mining association rules international conference on management of data. ,vol. 24, pp. 175- 186 ,(1995) , 10.1145/223784.223813
Haoyuan Li, Yi Wang, Dong Zhang, Ming Zhang, Edward Y. Chang, Pfp Proceedings of the 2008 ACM conference on Recommender systems - RecSys '08. pp. 107- 114 ,(2008) , 10.1145/1454008.1454027
R. Agrawal, J.C. Shafer, Parallel mining of association rules IEEE Transactions on Knowledge and Data Engineering. ,vol. 8, pp. 962- 969 ,(1996) , 10.1109/69.553164
Hongjian Qiu, Rong Gu, Chunfeng Yuan, Yihua Huang, YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark international parallel and distributed processing symposium. pp. 1664- 1671 ,(2014) , 10.1109/IPDPSW.2014.185