Scalable kNN search on vertically stored time series

作者: Shrikant Kashyap , Panagiotis Karras

DOI: 10.1145/2020408.2020607

关键词: Pruning (decision trees)Search engine indexingArtificial intelligenceCurse of dimensionalityData miningComputer sciencek-nearest neighbors algorithmFeature vectorIndex (publishing)ScalabilityNearest neighbor searchMachine learning

摘要: Nearest-neighbor search over time series has received vast research attention as a basic data mining task. Still, none of the hitherto proposed methods scales well with increasing time-series length. This is due to fact that all provide an one-off pruning capacity only. In particular, traditional utilize index in reduced-dimensionality feature space; however, for high length, such yields many false hits need be eliminated by accessing full records. An attempt reduce indexing more features exacerbates curse dimensionality, and vice versa. A recently alternative, iSAX, uses symbolic approximate representations accessed simple file-system directory index. iSAX also encounters hits, which are again records full: once hit generated index, there no second chance prune it; thus, provides one-off. paper proposes alternative approach kNN search, following nontraditional style. Instead navigating through candidate via we access their features, obtained multi-resolution transform, stepwise sequential-scan manner, one level resolution at time, vertical representation. Most candidates progressively after few terms accessed, using pre-computed information unprecedentedly tight double-bounding scheme, involving not only lower, but upper distance bounds. Our experimental study large, high-length confirms advantage our both current state-of-the-art method, classical index-based methods.

参考文章(55)
Rakesh Agrawal, Christos Faloutsos, Arun Swami, None, Efficient Similarity Search In Sequence Databases FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms. pp. 69- 84 ,(1993) , 10.1007/3-540-57301-1_5
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Similarity Search in High Dimensions via Hashing very large data bases. pp. 518- 529 ,(1999)
Hans-Jörg Schek, Stephen Blott, Roger Weber, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces very large data bases. pp. 194- 205 ,(1998)
Xiang Lian, Jeffrey Xu Yu, Yunhao Liu, Lei Chen, Qiuxia Chen, Indexable PLA for efficient similarity search very large data bases. pp. 435- 446 ,(2007)
H. V. Jagadish, Beng Chin Ooi, Cui Yu, Kian-Lee Tan, Indexing the Distance: An Efficient Method to KNN Processing very large data bases. pp. 421- 430 ,(2001)
Pierre Geurts, Pattern Extraction for Time Series Classification european conference on principles of data mining and knowledge discovery. pp. 115- 127 ,(2001) , 10.1007/3-540-44794-6_10
Christos Faloutsos, Flip Korn, Zenon Protopapas, Nikolaos Sidiropoulos, Eliot Siegel, Fast nearest neighbor search in medical image databases very large data bases. pp. 215- 226 ,(1996)
Gísli R. Hjaltason, Hanan Samet, Distance browsing in spatial databases ACM Transactions on Database Systems. ,vol. 24, pp. 265- 318 ,(1999) , 10.1145/320248.320255
Yuhan Cai, Raymond Ng, Indexing spatio-temporal trajectories with Chebyshev polynomials international conference on management of data. pp. 599- 610 ,(2004) , 10.1145/1007568.1007636
H. V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang, iDistance: An adaptive B+-tree based indexing method for nearest neighbor search ACM Transactions on Database Systems. ,vol. 30, pp. 364- 397 ,(2005) , 10.1145/1071610.1071612