DSM-FI: an efficient algorithm for mining frequent itemsets in data streams

作者: Hua-Fu Li , Man-Kwan Shan , Suh-Yin Lee

DOI: 10.1007/S10115-007-0112-4

关键词:

摘要: Online mining of data streams is an important problem with broad applications. However, it also a difficult since the streaming possess some inherent characteristics. In this paper, we propose new single-pass algorithm, called DSM-FI (data stream for frequent itemsets), online incremental itemsets over continuous transactions. According to proposed each transaction projected into set sub-transactions, and these sub-transactions are inserted in-memory summary structure, SFI-forest (summary itemset forest) maintaining all embedded in generated so far. Finally, determined from current SFI-forest. Theoretical analysis experimental studies show that algorithm uses stable memory, makes only one pass transactional stream, outperforms existing algorithms one-pass itemsets.

参考文章(20)
Joong Hyuk Chang, Won Suk Lee, A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams Journal of Information Science and Engineering. ,vol. 20, pp. 753- 762 ,(2004) , 10.6688/JISE.2004.20.4.7
Gurmeet Singh Manku, Rajeev Motwani, Chapter 31 – Approximate Frequency Counts over Data Streams very large data bases. pp. 346- 357 ,(2002) , 10.1016/B978-155860869-6/50038-X
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, Philip S Yu, Mining Frequent Patterns in Data Streams at Multiple Time Granularities ,(2002)
Joong Hyuk Chang, Won Suk Lee, Decaying Obsolete Information in Finding Recent Frequent Itemsets over Data Streams IEICE Transactions on Information and Systems. ,vol. 87, pp. 1588- 1592 ,(2004)
Jeffrey Xu Yu, Zhihong Chong, Hongjun Lu, Aoying Zhou, False positive or false negative: mining frequent itemsets from high speed transactional data streams very large data bases. pp. 204- 215 ,(2004) , 10.1016/B978-012088469-8.50021-8
Wei-Guang Teng, Ming-Syan Chen, Philip S. Yu, A regression-based temporal pattern mining scheme for data streams very large data bases. pp. 93- 104 ,(2003) , 10.1016/B978-012722442-8/50017-3
Yunyue Zhu, Dennis Shasha, StatStream: statistical monitoring of thousands of data streams in real time very large data bases. pp. 358- 369 ,(2002) , 10.1016/B978-155860869-6/50039-1
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom, Models and issues in data stream systems symposium on principles of database systems. pp. 1- 16 ,(2002) , 10.1145/543613.543615