StatStream: statistical monitoring of thousands of data streams in real time

作者: Yunyue Zhu , Dennis Shasha

DOI: 10.1016/B978-155860869-6/50039-1

关键词: Time seriesData structureFourier transformInterval (mathematics)Discrete Fourier transformSliding window protocolComputer scienceSynthetic dataComputationData stream miningData mining

摘要: Consider the problem of monitoring tens thousands time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average standard deviation, we also want find high correlations among all pairs streams. A stock market trader might use a tool spot arbitrage opportunities. This paper proposes efficient methods for solving this Discrete Fourier Transforms three level interval hierarchy. Extensive experiments synthetic real world financial trading show that our algorithm beats direct computation approach by several orders magnitude. It improves previous Transform approaches allowing time-delayed correlation over any size sliding window delay. Correlation lends itself grid-based structure. The result is first know compute time. incremental, has fixed response time, can monitor pairwise 10,000 PC. embarrassingly parallelizable.

参考文章(26)
Rakesh Agrawal, Christos Faloutsos, Arun Swami, None, Efficient Similarity Search In Sequence Databases FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms. pp. 69- 84 ,(1993) , 10.1007/3-540-57301-1_5
Dina Q. Goldin, Paris C. Kanellakis, On Similarity Queries for Time-Series Data: Constraint Specification and Implementation principles and practice of constraint programming. pp. 137- 153 ,(1995) , 10.1007/3-540-60299-2_9
Corinna Cortes, Kathleen Fisher, Daryl Pregibon, Anne Rogers, Hancock: a language for extracting signatures from data streams knowledge discovery and data mining. pp. 9- 17 ,(2000) , 10.1145/347090.347094
Yi-Leh Wu, Divyakant Agrawal, Amr El Abbadi, A comparison of DFT and DWT based similarity search in time-series databases Proceedings of the ninth international conference on Information and knowledge management - CIKM '00. pp. 488- 495 ,(2000) , 10.1145/354756.354857
Mayur Datar, Aristides Gionis, Piotr Indyk, Rajeev Motwani, Maintaining Stream Statistics over Sliding Windows SIAM Journal on Computing. ,vol. 31, pp. 1794- 1813 ,(2002) , 10.1137/S0097539701398363
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Mayur Datar, Maintaining stream statistics over sliding windows: (extended abstract) symposium on discrete algorithms. pp. 635- 644 ,(2002) , 10.5555/545381.545466
Jon Louis Bentley, Bruce W Weide, Andrew C Yao, None, Optimal Expected-Time Algorithms for Closest Point Problems ACM Transactions on Mathematical Software. ,vol. 6, pp. 563- 580 ,(1980) , 10.1145/355921.355927
Byoung-Kee Yi, Christos Faloutsos, Fast Time Sequence Indexing for Arbitrary Lp Norms very large data bases. pp. 385- 394 ,(2000)
Kin-Pong Chan, Ada Wai-Chee Fu, None, Efficient time series matching by wavelets international conference on data engineering. pp. 126- 133 ,(1999) , 10.1109/ICDE.1999.754915
Davood Rafiei, Alberto Mendelzon, Similarity-based queries for time series data international conference on management of data. ,vol. 26, pp. 13- 25 ,(1997) , 10.1145/253260.253264