Efficient algorithms for mining outliers from large data sets

作者: Sridhar Ramaswamy , Rajeev Rastogi , Kyuseok Shim

DOI: 10.1145/335191.335437

关键词: Disjoint setsPartition (database)Computer scienceLocal outlier factorCurse of dimensionalityOutlierRankingData setData miningk-nearest neighbors algorithmComputation

摘要: … can handle large data sets. Among algorithms with lower complexities is the pre-clustering phase of BIRCH [ZRL96], a state-of-the-art clustering algorithm that can handle large data sets…

参考文章(20)
Raymond T. Ng, Edwin M. Knorr, Algorithms for Mining Distance-Based Outliers in Large Datasets very large data bases. pp. 392- 403 ,(1998)
Heikki Mannila, A. Inkeri Verkamo, Ramakrishnan Srikant, Hannu Toivonen, Rakesh Agrawal, Fast discovery of association rules knowledge discovery and data mining. pp. 307- 328 ,(1996)
Prabhakar Raghavan, Andreas Arning, Rakesh Agrawal, A linear method for deviation detection in large databases knowledge discovery and data mining. pp. 164- 169 ,(1996)
Sunita Sarawagi, Rakesh Agrawal, Nimrod Megiddo, Discovery-Driven Exploration of OLAP Data Cubes extending database technology. pp. 168- 182 ,(1998) , 10.1007/BFB0100984
Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining very large data bases. pp. 144- 155 ,(1994)
Raymond T. Ng, Edwin M. Knorr, Finding Intensional Knowledge of Distance-Based Outliers very large data bases. pp. 211- 222 ,(1999)
Vic Barnett, Toby Lewis, Outliers in Statistical Data ,(1978)
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Phillip B. Gibbons, Yossi Matias, New sampling-based summary statistics for improving approximate query answers Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 331- 342 ,(1998) , 10.1145/276304.276334
Michael I. Shamos, Franco P. Preparata, Computational Geometry: An Introduction ,(1978)