作者: Fabrizio Angiulli , Clara Pizzuti
关键词: Data set 、 Point (geometry) 、 Solution set 、 Time complexity 、 k-nearest neighbors algorithm 、 Combinatorics 、 Mathematics 、 Curse of dimensionality 、 Hilbert curve 、 Outlier
摘要: In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the k nearest neighbors of each point in a fast and efficient way by linearizing the search space through the Hilbert space filling curve. The algorithm consists of two phases, the first provides an approximated solution, within a small factor, after executing at most d+ 1 scans …