Mining thick skylines over large databases

作者: Wen Jin , Jiawei Han , Martin Ester

DOI: 10.1007/978-3-540-30116-5_25

关键词:

摘要: People recently are interested in a new operator, called skyline [3], which returns the objects that not dominated by any other with regard to certain measures multi-dimensional space. Recent work on operator [3,15,8,13,2] focuses efficient computation of skylines large databases. However, such gives users only thin skylines, i.e., single objects, may be desirable some real applications. In this paper, we propose novel concept, thick skyline, recommends but also their nearby neighbors within -distance. Efficient methods developed including (1) two algorithms, Sampling-and-Pruning and Indexing-and-Estimating, find help statistics or indexes databases, (2) highly Microcluster-based algorithm for mining skyline. The method leads substantial savings provides cocise representation case high cardinalities. Our experimental performance study shows proposed both effective.

参考文章(18)
Wolf-Tilo Balke, Ulrich Güntzer, Jason Xin Zheng, Efficient Distributed Skylining for Web Information Systems extending database technology. ,vol. 2992, pp. 256- 273 ,(2004) , 10.1007/978-3-540-24741-8_16
Alexander Hinneburg, Daniel A. Keim, Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering very large data bases. pp. 506- 517 ,(1999)
Beng Chin Ooi, Pin-Kwang Eng, Kian-Lee Tan, Efficient Progressive Skyline Computation very large data bases. pp. 301- 310 ,(2001)
Kenneth L. Clarkson, Jon L. Bentley, David B. Levine, Fast linear expected-time alogorithms for computing maxima and convex hulls symposium on discrete algorithms. pp. 179- 187 ,(1990) , 10.5555/320176.320196
Michael I. Shamos, Franco P. Preparata, Computational Geometry: An Introduction ,(1978)
Ivan Stojmenović, Masahiro Miyakawa, An optimal parallel algorithm for solving the maximal elements problem in the plane parallel computing. ,vol. 7, pp. 249- 251 ,(1988) , 10.1016/0167-8191(88)90042-7
Jiří Matoušek, Computing dominances in E n (short communication) Information Processing Letters. ,vol. 38, pp. 277- 278 ,(1991) , 10.1016/0020-0190(91)90071-O
H. T. Kung, F. Luccio, F. P. Preparata, On Finding the Maxima of a Set of Vectors Journal of the ACM. ,vol. 22, pp. 469- 476 ,(1975) , 10.1145/321906.321910
Wen Jin, Anthony K. H. Tung, Jiawei Han, Mining top-n local outliers in large databases knowledge discovery and data mining. pp. 293- 298 ,(2001) , 10.1145/502512.502554
Franck Nielsen, Output-sensitive peeling of convex and maximal layers Information Processing Letters. ,vol. 59, pp. 255- 259 ,(1996) , 10.1016/0020-0190(96)00116-0