Tree Projection-Based Frequent Itemset Mining on Multicore CPUs and GPUs

作者: George Teodoro , Nathan Mariano , Wagner Meira Jr. , Renato Ferreira

DOI: 10.1109/SBAC-PAD.2010.15

关键词:

摘要: Frequent itemset mining (FIM) is a core operation for several data applications as association rules computation, correlations, document classification, and many others, which has been extensively studied over the last decades. Moreover, databases are becoming increasingly larger, thus requiring higher computing power to mine them in reasonable time. At same time, advances high performance platforms transforming into hierarchical parallel environments equipped with multi-core processors many-core accelerators, such GPUs. Thus, fully exploiting these systems perform FIM tasks poses challenging critical problem that we address this paper. We present efficient GPU accelerated parallelizations of Tree Projection, one most competitive algorithms. The experimental results show our Projection implementation scales almost linearly CPU shared-memory environment after careful optimizations, while versions up 173 times faster than standard version.

参考文章(18)
Srinivasan Parthasarathy, Mitsunori Ogihara, Mohammed J Zaki, Wei Li, New algorithms for fast discovery of association rules knowledge discovery and data mining. pp. 283- 286 ,(1997)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
David R. Butenhof, Programming with POSIX threads ,(1993)
Joseph S. Berrios, Manuel E. Bermudez, Using wait-free synchronization in the design of distributed applications Future Generation Computer Systems. ,vol. 22, pp. 46- 56 ,(2006) , 10.1016/J.FUTURE.2004.11.012
M.J. Zaki, Parallel and distributed association mining: a survey IEEE Concurrency. ,vol. 7, pp. 14- 25 ,(1999) , 10.1109/4434.806975
Gregory Buehrer, Srinivasan Parthasarathy, Shirish Tatikonda, Tahsin Kurc, Joel Saltz, Toward terabyte pattern mining Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '07. pp. 2- 12 ,(2007) , 10.1145/1229428.1229432
Wenbin Fang, Mian Lu, Xiangye Xiao, Bingsheng He, Qiong Luo, Frequent itemset mining on graphics processors Proceedings of the Fifth International Workshop on Data Management on New Hardware - DaMoN '09. pp. 34- 42 ,(2009) , 10.1145/1565694.1565702
Ramesh C. Agarwal, Charu C. Aggarwal, V.V.V. Prasad, A Tree Projection Algorithm for Generation of Frequent Item Sets Journal of Parallel and Distributed Computing. ,vol. 61, pp. 350- 371 ,(2001) , 10.1006/JPDC.2000.1693
George Teodoro, Timothy D. R. Hartley, Umit Catalyurek, Renato Ferreira, Run-time optimizations for replicated dataflows on heterogeneous environments high performance distributed computing. pp. 13- 24 ,(2010) , 10.1145/1851476.1851479
Jiawei Han, Jian Pei, Yiwen Yin, Mining frequent patterns without candidate generation international conference on management of data. ,vol. 29, pp. 1- 12 ,(2000) , 10.1145/335191.335372