PRESEE: An MDL/MML Algorithm to Time-Series Stream Segmenting

作者: Kaikuo Xu , Yexi Jiang , Mingjie Tang , Changan Yuan , Changjie Tang

DOI: 10.1155/2013/386180

关键词:

摘要: Time-series stream is one of the most common data types in mining field. It prevalent fields such as stock market, ecology, and medical care. Segmentation a key step to accelerate processing speed time-series mining. Previous algorithms for segmenting mainly focused on issue ameliorating precision instead paying much attention efficiency. Moreover, performance these depends heavily parameters, which are hard users set. In this paper, we propose PRESEE (parameter-free, real-time, scalable algorithm), greatly improves efficiency segmenting. based both MDL (minimum description length) MML message methods, could segment automatically. To evaluate PRESEE, conduct several experiments streams different compare it with state-of-art algorithm. The empirical results show that very efficient real-time datasets by improving nearly ten times. novelty algorithm further demonstrated application from ChinaFLUX sensor networks stream.

参考文章(33)
João Gama, Pedro Pereira Rodrigues, João Pedro Pedroso, ODAC: Hierarchical Clustering of Time Series Data Streams. siam international conference on data mining. pp. 499- 503 ,(2006)
Matt Schmill, Victor Lavrenko, David Jensen, Dawn Lawrie, Paul Ogilvie, Mining of Concurrent Text and Time Series ,(2008)
Evimaria Terzi, Panayiotis Tsaparas, Efficient Algorithms for Sequence Segmentation. siam international conference on data mining. pp. 316- 327 ,(2006)
Rakesh Agrawal, Christos Faloutsos, Arun Swami, None, Efficient Similarity Search In Sequence Databases FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms. pp. 69- 84 ,(1993) , 10.1007/3-540-57301-1_5
Jim Hunter, Neil McIntosh, Knowledge-Based Event Detection in Complex Time Series Data european conference on artificial intelligence. pp. 271- 280 ,(1999) , 10.1007/3-540-48720-4_30
Xianping Ge, Tom Ni, Padhraic Smyth, Wenli Collison, Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching ,(2000)
Heikki Mannila, Aristides Gionis, Niina Haiminen, Evimaria Terzi, Ella Bingham, Heli Hiisilä, Segmentation and Dimensionality Reduction siam international conference on data mining. pp. 372- 383 ,(2006)
Peter D. Grünwald, In Jae Myung, Mark A. Pitt, Advances in Minimum Description Length: Theory and Applications MIT Press. ,(2005)
Donald B. Percival, Andrew T. Walden, Wavelet Methods for Time Series Analysis ,(2006)
Maria Kontaki, Apostolos N. Papadopoulos, Yannis Manolopoulos, Continuous Trend-Based Classification of Streaming Time Series Advances in Databases and Information Systems. pp. 294- 308 ,(2005) , 10.1007/11547686_22