SUBSKY: Efficient Computation of Skylines in Subspaces

作者: Yufei Tao , Xiaokui Xiao , Jian Pei

DOI: 10.1109/ICDE.2006.149

关键词:

摘要: Given a set of multi-dimensional points, the skyline contains best points according to any preference function that is monotone on all axes. In practice, applications require analysis usually provide numerous candidate attributes, and various users depending their interests may issue queries regarding different (small) subsets dimensions. Formally, given relation with large number (e.g.,ge 10) query aims at finding in an arbitrary subspace low dimensionality (e.g., 2). The existing algorithms do not support retrieval efficiently because they (i) scanning entire database least once, or (ii) are optimized for one particular but incur significant overhead other subspaces. this paper, we propose technique SUBSKY which settles problem using single B-tree, can be implemented relational database. core transformation converts data 1D values, enables several effective pruning heuristics. Extensive experiments real confirm outperforms alternative approaches significantly both efficiency scalability.

参考文章(21)
Jarek Gryz, Ryan Shipley, Parke Godfrey, Maximal vector computation in large data sets very large data bases. pp. 229- 240 ,(2005)
Wolf-Tilo Balke, Ulrich Güntzer, Jason Xin Zheng, Efficient Distributed Skylining for Web Information Systems extending database technology. ,vol. 2992, pp. 256- 273 ,(2004) , 10.1007/978-3-540-24741-8_16
Beng Chin Ooi, Pin-Kwang Eng, Kian-Lee Tan, Efficient Progressive Skyline Computation very large data bases. pp. 301- 310 ,(2001)
Dan Pelleg, Andrew W. Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters international conference on machine learning. pp. 727- 734 ,(2000)
Parke Godfrey, Skyline Cardinality for Relational Processing foundations of information and knowledge systems. pp. 78- 97 ,(2004) , 10.1007/978-3-540-24627-5_7
Gísli R. Hjaltason, Hanan Samet, Distance browsing in spatial databases ACM Transactions on Database Systems. ,vol. 24, pp. 265- 318 ,(1999) , 10.1145/320248.320255
Chee-Yong Chan, Pin-Kwang Eng, Kian-Lee Tan, Stratified computation of skylines with partially-ordered domains Proceedings of the 2005 ACM SIGMOD international conference on Management of data - SIGMOD '05. pp. 203- 214 ,(2005) , 10.1145/1066157.1066181
Nick Roussopoulos, Stephen Kelley, Frédéric Vincent, Nearest neighbor queries international conference on management of data. ,vol. 24, pp. 71- 79 ,(1995) , 10.1145/223784.223794
Stefan Berchtold, Christian Bohm, Hosagrahar V Jagadish, H-P Kriegel, Jörg Sander, Independent quantization: an index compression technique for high-dimensional data spaces international conference on data engineering. pp. 577- 588 ,(2000) , 10.1109/ICDE.2000.839456
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh, Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS international conference on data engineering. ,vol. 1, pp. 555- 567 ,(1996) , 10.1023/A:1009726021843