Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data

作者: Albrecht Zimmermann

DOI: 10.3233/IDA-140668

关键词: Task (project management)Public domainTest data generationData miningSemanticsBenchmarkingTemporal databaseNoise (video)Computer scienceData stream mining

摘要: Frequent episode mining has been proposed as a data task for recovering sequential patterns from temporal sequences and several approaches have introduced over the last fifteen years. These techniques however never compared against each other in large scale comparison, mainly because existing real life is prevented entering public domain by non-disclosure agreements. We perform such comparison first time. To get around problem of proprietary data, we employ generator based on number observations capable generating that mimics at our disposal. Artificial offers additional advantage underlying are known, which typically not case data. Thus, can evaluate time ability to recover embedded noise. Our experiments indicate constraints more important affecting effectiveness than occurrence semantics. They also when phenomena present same rather difficult there need develop better significance measures dealing with sets episodes.

参考文章(27)
Heikki Mannila, A. Inkeri Verkamo, Hannu Toivonen, Discovering Frequent Episodes in Sequences. knowledge discovery and data mining. pp. 210- 215 ,(1995)
Heikki Mannila, Hannu Toivonen, Discovering generalized episodes using minimal occurrences knowledge discovery and data mining. pp. 146- 151 ,(1996)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, Brandon Westover, Exact Discovery of Time Series Motifs. siam international conference on data mining. ,vol. 2009, pp. 473- 484 ,(2009) , 10.1137/1.9781611972795.41
M. Atallah, R. Gwadera, W. Szpankowski, Detection of significant sets of episodes in event sequences international conference on data mining. pp. 3- 10 ,(2004) , 10.1109/ICDM.2004.10090
Nicolas Méger, Christophe Rigotti, Constraint-based mining of episode rules and optimal window sizes european conference on principles of data mining and knowledge discovery. pp. 313- 324 ,(2004) , 10.1007/978-3-540-30116-5_30
Quentin F. Stout, David M. Pennock, Exploiting a theory of phase transitions in three-satisfiability problems national conference on artificial intelligence. pp. 253- 258 ,(1996)
Gemma Casas-Garriga, Discovering Unbounded Episodes in Sequential Data european conference on principles of data mining and knowledge discovery. pp. 83- 94 ,(2003) , 10.1007/978-3-540-39804-2_10
Tijl De Bie, Explicit probabilistic models for databases and networks arXiv: Artificial Intelligence. ,(2009)
Albert Bifet, Ricard Gavaldà, Adaptive XML tree classification on evolving data streams european conference on machine learning. pp. 147- 162 ,(2009) , 10.1007/978-3-642-04180-8_27