An efficient algorithm for mining sequential generator pattern using prefix trees and hash tables

作者: Thi Thiet Pham , Jiawei Luo , Tzung Pei Hong

DOI: 10.1504/IJISTA.2014.065151

关键词:

摘要: Mining long frequent sequences that contain a combinatorial number of subsequences or using very low support thresholds to mine sequential patterns is both time-and memory-consuming. The mining closed patterns, generator and maximum has been proposed overcome this problem. This paper proposes an algorithm for generating all patterns. uses vertical approach listing counting the sequence based on prime block encoding represent candidate determine frequency each candidate. search space much smaller than those other algorithms because super frequency-based pruning non-generator-based are applied. Besides, hash tables also used fast checking existed Experimental results conducted synthetic real databases show effective.

参考文章(24)
Thi-Thiet Pham, Jiawei Luo, Tzung-Pei Hong, Bay Vo, MSGPs: A Novel Algorithm for Mining Sequential Generator Patterns Computational Collective Intelligence. Technologies and Applications. pp. 393- 401 ,(2012) , 10.1007/978-3-642-34707-8_40
Thien-Trang Van, Bay Vo, Bac Le, Mining Sequential Rules Based on Prefix-Tree asian conference on intelligent information and database systems. pp. 147- 156 ,(2011) , 10.1007/978-3-642-19953-0_15
David Lo, Siau-Cheng Khoo, Jinyan Li, Mining and Ranking Generators of Sequential Pattern siam international conference on data mining. pp. 553- 564 ,(2008)
Jian Pei, Guozhu Dong, Jinyan Li, Limsoon Wong, Haiquan Li, Minimum description length principle: generators are preferable to closed patterns national conference on artificial intelligence. ,vol. 1, pp. 409- 414 ,(2006)
Mohammed J. Zaki, SPADE: An Efficient Algorithm for Mining Frequent Sequences Machine Learning. ,vol. 42, pp. 31- 60 ,(2001) , 10.1023/A:1007652502315
Ramakrishnan Srikant, Rakesh Agrawal, Mining sequential patterns: Generalizations and performance improvements Advances in Database Technology — EDBT '96. pp. 1- 17 ,(1996) , 10.1007/BFB0014140
Jiawei Han, Ramin Afshar, Xifeng Yan, CloSpan: Mining Closed Sequential Patterns in Large Databases. siam international conference on data mining. pp. 166- 177 ,(2003)
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur, Dynamic itemset counting and implication rules for market basket data international conference on management of data. ,vol. 26, pp. 255- 264 ,(1997) , 10.1145/253260.253325
Congnan Luo, Soon M. Chung, A scalable algorithm for mining maximal frequent sequences using a sample Knowledge and Information Systems. ,vol. 15, pp. 149- 179 ,(2008) , 10.1007/S10115-006-0056-0