Fast Outlier Detection in High Dimensional Spaces

作者: Fabrizio Angiulli , Clara Pizzuti

DOI: 10.1007/3-540-45681-3_2

关键词: Data setPoint (geometry)Solution setTime complexityk-nearest neighbors algorithmCombinatoricsMathematicsCurse of dimensionalityHilbert curveOutlier

摘要: In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the k nearest neighbors of each point in a fast and efficient way by linearizing the search space through the Hilbert space filling curve. The algorithm consists of two phases, the first provides an approximated solution, within a small factor, after executing at most d+ 1 scans …

参考文章(20)
Carla E. Brodley, Mark A. Friedl, Identifying and eliminating mislabeled training instances national conference on artificial intelligence. pp. 799- 805 ,(1996)
Raymond T. Ng, Edwin M. Knorr, Algorithms for Mining Distance-Based Outliers in Large Datasets very large data bases. pp. 392- 403 ,(1998)
Prabhakar Raghavan, Andreas Arning, Rakesh Agrawal, A linear method for deviation detection in large databases knowledge discovery and data mining. pp. 164- 169 ,(1996)
Sunita Sarawagi, Rakesh Agrawal, Nimrod Megiddo, Discovery-Driven Exploration of OLAP Data Cubes extending database technology. pp. 168- 182 ,(1998) , 10.1007/BFB0100984
Swanwa Liao, Mario Alberto López, Finding k-Closest-Pairs Efficiently for High Dimensional Data. canadian conference on computational geometry. ,(2000)
Kenji Yamanishi, Jun-ichi Takeuchi, Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner knowledge discovery and data mining. pp. 389- 394 ,(2001) , 10.1145/502512.502570
Vic Barnett, Toby Lewis, Outliers in Statistical Data ,(1978)
C. Faloutsos, S. Roseman, Fractals for secondary key retrieval symposium on principles of database systems. pp. 247- 252 ,(1989) , 10.1145/73721.73746
Kenji Yamanishi, Jun-Ichi Takeuchi, Graham Williams, Peter Milne, On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms knowledge discovery and data mining. pp. 320- 324 ,(2000) , 10.1145/347090.347160