Disk-Based Sampling for Outlier Detection in High Dimensional Data

作者： Pei Sun , Timothy de Vries , Sanjay Chawla , Gia Vinh Anh Pham

DOI:

关键词: Phase (waves) 、 Data mining 、 Clustering high-dimensional data 、 Curse of dimensionality 、 Sampling (statistics) 、 Computer science 、 Data set 、 Algorithm 、 Outlier 、 Set (abstract data type) 、 Anomaly detection

摘要: We propose an efficient sampling based outlier detection method for large high-dimensional data. Our consists of two phases. In the first phase, we combine a “sampling” strategy with simple randomized partitioning technique to generate candidate set outliers. This phase requires one full data scan and running time has linear complexity respect size dimensionality set. An additional scan, which constitutes second extracts actual outliers from The this O(CN) where C N are respectively. major strengths proposed approach that (1) no dimensions is required thus making it particularly suitable high dimensional (2) small (0.5% original set) can discover more than 99% all identified by brute-force approach. present detailed experimental evaluation our on real synthetic sets compare another

iitb.ac.in 本地加速

iitb.ac.in PDF 下载加速

参考文章(4)

Fabrizio Angiulli, Clara Pizzuti, Fast Outlier Detection in High Dimensional Spaces european conference on principles of data mining and knowledge discovery. pp. 15- 26 ,(2002) , 10.1007/3-540-45681-3_2

Raymond T. Ng, Edwin M. Knorr, Algorithms for Mining Distance-Based Outliers in Large Datasets very large data bases. pp. 392- 403 ,(1998)

Douglas M. Hawkins, Identification of outliers ,(1980)

Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim, Efficient algorithms for mining outliers from large data sets international conference on management of data. ,vol. 29, pp. 427- 438 ,(2000) , 10.1145/335191.335437

Disk-Based Sampling for Outlier Detection in High Dimensional Data

来源期刊

我的账户

Disk-Based Sampling for Outlier Detection in High Dimensional Data

来源期刊

相似文章 0

我的账户