HOT aSAX: a novel adaptive symbolic representation for time series discords discovery

作者: Ninh D. Pham , Quang Loc Le , Tran Khanh Dang

DOI: 10.1007/978-3-642-12145-6_12

关键词: Data miningCluster analysisComputer scienceData cleansingSeries (mathematics)Pruning (decision trees)Time series databaseGaussianAnomaly detectionRepresentation (mathematics)

摘要: Finding discords in time series database is an important problem the last decade due to its variety of real-world applications, including data cleansing, fault diagnostics, and financial analysis. The best known approach our knowledge HOT SAX technique based on equiprobable distribution representations series. This characteristic, however, not preserved reduced-dimensionality literature, especially lack Gaussian datasets. In this paper, we introduce a k-means algorithm for symbolic called adaptive Symbolic Aggregate approXimation (aSAX) propose aSAX discovery. Due clustered characteristic words, produces greater pruning power than previous approach. Our empirical experiments with datasets confirm theoretical analyses as well efficiency

参考文章(8)
Yingyi Bu, Jian Pei, Ada Wai-Chee Fu, Eamonn J. Keogh, Oscar Tat-Wing Leung, Sam Meshkin, WAT: Finding Top-K Discords in Time Series Database. siam international conference on data mining. pp. 449- 454 ,(2007)
Kin-Pong Chan, Ada Wai-Chee Fu, None, Efficient time series matching by wavelets international conference on data engineering. pp. 126- 133 ,(1999) , 10.1109/ICDE.1999.754915
Eamonn Keogh, Kaushik Chakrabarti, Michael Pazzani, Sharad Mehrotra, Dimensionality reduction for fast similarity search in large time series databases Knowledge and Information Systems. ,vol. 3, pp. 263- 286 ,(2001) , 10.1007/PL00011669
E. Keogh, J. Lin, A. Fu, HOT SAX: efficiently finding the most unusual time series subsequence international conference on data mining. pp. 226- 233 ,(2005) , 10.1109/ICDM.2005.79
J. B. Macqueen, Some methods for classification and analysis of multivariate observations Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. ,vol. 1, pp. 281- 297 ,(1967)
Christos Faloutsos, M. Ranganathan, Yannis Manolopoulos, Fast subsequence matching in time-series databases Proceedings of the 1994 ACM SIGMOD international conference on Management of data - SIGMOD '94. ,vol. 23, pp. 419- 429 ,(1994) , 10.1145/191839.191925
S. Lloyd, Least squares quantization in PCM IEEE Transactions on Information Theory. ,vol. 28, pp. 129- 137 ,(1982) , 10.1109/TIT.1982.1056489
Jessica Lin, Eamonn Keogh, Li Wei, Stefano Lonardi, Experiencing SAX: a novel symbolic representation of time series Data Mining and Knowledge Discovery. ,vol. 15, pp. 107- 144 ,(2007) , 10.1007/S10618-007-0064-Z