作者: Hui Zheng , Peng Li , Qing Liu , Jinjun Chen , Guangli Huang
DOI: 10.1016/J.INS.2019.11.023
关键词:
摘要: Abstract Discovering frequent itemsets is essential for finding association rules, yet too computational expensive using existing algorithms. It even more challenging to find upon streaming numeric data. The characteristic leads a challenge that data cannot be scanned repetitively. requires should pre-processed into itemsets, e.g., fuzzy-set methods can transform with non-integer membership values. This the frequency of are usually not integer. To overcome such challenges, fast and stream processing have been applied. However, algorithms either still need re-visit some previous multiple times, or count frequencies. Those re-visiting sacrifice large memory spaces cache those avoid repetitive scanning. When dealing big nowadays, large-memory requirement often goes beyond capacity many computers. unable frequencies would very inaccurate in estimating if used integer approximation frequency-counting. solve aforementioned issues, this paper we propose two incremental schemes discovery capable work efficiently In particular, they able without any key our benefits efficiency extract statistics occupy much less than raw do ongoing grants advantages 1) allowing counting thus natural integration discretization method boost robustness anti-noise capability data, 2) enabling design decay ratio different distributions, which adapted three general models: landmark, damped sliding windows, 3) achieving highly-accurate fuzzy-item-sets efficient stream-processing. Experimental studies demonstrate effectiveness dual both synthetic real-world datasets.