On Efficient Processing of Subspace Skyline Queries on High Dimensional Data

作者: Wen Jin , Anthony K. H. Tung , Martin Ester , Jiawei Han

DOI: 10.1109/SSDBM.2007.20

关键词: Pruning (decision trees)Curse of dimensionalityRedundancy (engineering)Clustering high-dimensional dataSubspace topologyLinear subspaceComputer scienceSet (abstract data type)Theoretical computer scienceSkyline

摘要: Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused pre-materializing a set of skylines points in various subspaces while the second focus dynamically by using anchors to prune off through spatial reasoning. Despite effort compress pre-materialized removal redundancy, storage space for approach remain exponential number dimensions. query time other hand also grow substantially data with higher dimensionality where pruning power become much weaker. In this paper, we propose methods high dimensional such that both prematerialization and moderated. We novel notions maximal partial-dominating space, partial-dominated equality between pairs objects full use these concepts as foundation data. Query processing involves mostly simple operations computation is done only small subset candidate subspace. develop random sampling method compute an on-line fashion. Extensive experiments have been conducted demonstrated efficiency effectiveness our methods.

参考文章(23)
Jon Kleinberg, Christos Papadimitriou, Prabhakar Raghavan, A Microeconomic View of Data Mining Data Mining and Knowledge Discovery. ,vol. 2, pp. 311- 324 ,(1998) , 10.1023/A:1009726428407
Jarek Gryz, Ryan Shipley, Parke Godfrey, Maximal vector computation in large data sets very large data bases. pp. 229- 240 ,(2005)
Wolf-Tilo Balke, Ulrich Güntzer, Jason Xin Zheng, Efficient Distributed Skylining for Web Information Systems extending database technology. ,vol. 2992, pp. 256- 273 ,(2004) , 10.1007/978-3-540-24741-8_16
Beng Chin Ooi, Pin-Kwang Eng, Kian-Lee Tan, Efficient Progressive Skyline Computation very large data bases. pp. 301- 310 ,(2001)
Qiang Jing, Rui Yang, Panos Kalnis, Anthony K. H. Tung, Localized signature table: fast similarity search on transaction data conference on information and knowledge management. pp. 314- 323 ,(2004) , 10.1145/1031171.1031237
Cuiping Li, Beng Chin Ooi, Anthony K. H. Tung, Shan Wang, DADA: a data cube for dominant relationship analysis international conference on management of data. pp. 659- 670 ,(2006) , 10.1145/1142473.1142547
Chee-Yong Chan, Pin-Kwang Eng, Kian-Lee Tan, Stratified computation of skylines with partially-ordered domains Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05. pp. 203- 214 ,(2005) , 10.1145/1066157.1066181
H. T. Kung, F. Luccio, F. P. Preparata, On Finding the Maxima of a Set of Vectors Journal of the ACM. ,vol. 22, pp. 469- 476 ,(1975) , 10.1145/321906.321910
Zhenjie Zhang, Xinyu Guo, Hua Lu, Anthony K. H. Tung, Nan Wang, Discovering strong skyline points in high dimensional spaces Proceedings of the 14th ACM international conference on Information and knowledge management - CIKM '05. pp. 247- 248 ,(2005) , 10.1145/1099554.1099610
Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger, An optimal and progressive algorithm for skyline queries international conference on management of data. pp. 467- 478 ,(2003) , 10.1145/872757.872814