Detecting outlying subspaces for high-dimensional data: a heuristic search approach

作者: Ji Zhang

DOI:

关键词:

摘要: [Abstract]: In this paper, we identify a new task for studying the out-lying degree of high-dimensional data, i.e. finding sub-spaces (subset features) in which given points are out-liers, and propose novel detection algorithm, called High-D Outlying subspace Detection (HighDOD). We measure outlying point using sum distances between its k nearest neighbors. Heuristic pruning strategies proposed to realize fast search an efficient dynamic search method with sample-based learning process has been im- plemented. Experimental results show that HighDOD is outperforms other searching alternatives such as naive top-down, bottom-up random methods. Points these sparse subspaces assumed be the outliers. While knowing data the outliers can be useful, many applications, it more important given point outlier, motivates proposal a new technique paper handle task.

参考文章(10)
Fabrizio Angiulli, Clara Pizzuti, Fast Outlier Detection in High Dimensional Spaces european conference on principles of data mining and knowledge discovery. pp. 15- 26 ,(2002) , 10.1007/3-540-45681-3_2
Raymond T. Ng, Edwin M. Knorr, Algorithms for Mining Distance-Based Outliers in Large Datasets very large data bases. pp. 392- 403 ,(1998)
Arthur E. Mace, Sample-Size Determination. ,(1964)
Raymond T. Ng, Edwin M. Knorr, Finding Intensional Knowledge of Distance-Based Outliers very large data bases. pp. 211- 222 ,(1999)
Wen Jin, Anthony K. H. Tung, Jiawei Han, Mining top-n local outliers in large databases knowledge discovery and data mining. pp. 293- 298 ,(2001) , 10.1145/502512.502554
S. Papadimitriou, H. Kitagawa, P.B. Gibbons, C. Faloutsos, LOCI: fast outlier detection using the local correlation integral international conference on data engineering. pp. 315- 326 ,(2003) , 10.1109/ICDE.2003.1260802
Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim, Efficient algorithms for mining outliers from large data sets international conference on management of data. ,vol. 29, pp. 427- 438 ,(2000) , 10.1145/335191.335437
Micheline Kamber, Jiawei Han, Jian Pei, Data Mining: Concepts and Techniques ,(2000)
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander, LOF: identifying density-based local outliers international conference on management of data. ,vol. 29, pp. 93- 104 ,(2000) , 10.1145/335191.335388
Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel, The X-tree: an index structure for high-dimensional data very large data bases. pp. 451- 462 ,(2001) , 10.1016/B978-155860651-7/50124-8