Class separation through variance: a new application of outlier detection

作者: Andrew Foss , Osmar R. Zaïane

DOI: 10.1007/S10115-010-0347-3

关键词:

摘要: This paper introduces a new outlier detection approach and discusses extends concept, class separation through variance. We show that even for balanced concentric classes differing only in variance, accumulating information about the outlierness of points multiple subspaces leads to ranking which naturally tend separate. Exploiting this highly effective efficient unsupervised approach. Unlike typical algorithms, method can be applied beyond ‘rare classes’ case with great success. The algorithm FASTOUT number novel features. It employs sampling is efficient. handles arbitrarily sized converges an optimal subspace size use objective function. In addition, two approaches are presented automatically deriving data from ranking. Experiments typically outperforms other state-of-the-art methods on high-dimensional such as Feature Bagging, SOE1, LOF, ORCA Robust Mahalanobis Distance, competes leading supervised classification separating classes.

参考文章(39)
Hyunsoo Kim, Haesun Park, Data Reduction in Support Vector Machines by a Kernelized Ionic Interaction Model. siam international conference on data mining. pp. 507- 511 ,(2004)
Fabrizio Angiulli, Clara Pizzuti, Fast Outlier Detection in High Dimensional Spaces european conference on principles of data mining and knowledge discovery. pp. 15- 26 ,(2002) , 10.1007/3-540-45681-3_2
Zengyou He, Shengchun Deng, Xiaofei Xu, A Unified Subspace Outlier Ensemble Framework for Outlier Detection Advances in Web-Age Information Management. pp. 632- 637 ,(2005) , 10.1007/11563952_56
Shengchun Deng, Zengyou He, Xiaofei Xu, A Unified Subspace Outlier Ensemble Framework for Outlier Detection in High Dimensional Spaces arXiv: Databases. ,(2005)
Yuan Jiang, Zhi-Hua Zhou, Editing Training Data for kNN Classifiers with Neural Network Ensemble Advances in Neural Networks – ISNN 2004. pp. 356- 361 ,(2004) , 10.1007/978-3-540-28647-9_60
Raymond T. Ng, Edwin M. Knorr, Finding Intensional Knowledge of Distance-Based Outliers very large data bases. pp. 211- 222 ,(1999)
Mohammed Liakat Ali, Luis Rueda, Myriam Herrera, On the Performance of Chernoff-Distance-Based Linear Dimensionality Reduction Techniques Advances in Artificial Intelligence. pp. 467- 478 ,(2006) , 10.1007/11766247_40
Amelia Zafra, Sebastián Ventura, Multi-objective Genetic Programming for Multiple Instance Learning european conference on machine learning. pp. 790- 797 ,(2007) , 10.1007/978-3-540-74958-5_81
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft, When Is ''Nearest Neighbor'' Meaningful? international conference on database theory. pp. 217- 235 ,(1999) , 10.1007/3-540-49257-7_15