Subsampling for efficient and effective unsupervised outlier detection ensembles

作者: Arthur Zimek , Matthew Gaudet , Ricardo J.G.B. Campello , Jörg Sander

DOI: 10.1145/2487575.2487676

关键词:

摘要: Outlier detection and ensemble learning are well established research directions in data mining yet the application of techniques to outlier has been rarely studied. Here, we propose study subsampling as a technique induce diversity among individual detectors. We show analytically experimentally that an detector based on subsample per se, besides inducing diversity, can, under certain conditions, already improve upon results same complete dataset. Building top several subsamples is further improving results. While literature so far intuition ensembles over single detectors just transferred from classification literature, here also justify why expected work unsupervised area detection. As side effect, running dataset more efficient than other means introducing and, depending sample rate size ensemble, can be even data.

参考文章(48)
Arthur Zimek, Hans-Peter Kriegel, Erich Schubert, Peer Kröger, Interpreting and Unifying Outlier Scores siam international conference on data mining. pp. 13- 24 ,(2011)
Giorgio Valentini, Francesco Masulli, Ensembles of Learning Machines italian workshop on neural nets. ,vol. 2486, pp. 3- 22 ,(2002) , 10.1007/3-540-45808-5_1
Alberto Bertoni, Giorgio Valentini, Ensembles based on random projections to improve the accuracy of clustering algorithms italian workshop on neural nets. pp. 31- 37 ,(2005) , 10.1007/11731177_5
Fabrizio Angiulli, Clara Pizzuti, Fast Outlier Detection in High Dimensional Spaces european conference on principles of data mining and knowledge discovery. pp. 15- 26 ,(2002) , 10.1007/3-540-45681-3_2
Thomas G. Dietterich, Ensemble Methods in Machine Learning Multiple Classifier Systems. pp. 1- 15 ,(2000) , 10.1007/3-540-45014-9_1
Hoang Vu Nguyen, Hock Hee Ang, Vivekanand Gopalkrishnan, Mining outliers with ensemble of heterogeneous detectors on random subspaces database systems for advanced applications. pp. 368- 383 ,(2010) , 10.1007/978-3-642-12026-8_29
Vic Barnett, Toby Lewis, Outliers in Statistical Data ,(1978)
Frank E. Grubbs, Procedures for Detecting Outlying Observations in Samples Technometrics. ,vol. 11, pp. 1- 21 ,(1969) , 10.1080/00401706.1969.10490657
Hans-Peter Kriegel, Matthias S hubert, Arthur Zimek, Angle-based outlier detection in high-dimensional data Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 444- 452 ,(2008) , 10.1145/1401890.1401946
Naoki Abe, Bianca Zadrozny, John Langford, Outlier detection by active learning knowledge discovery and data mining. pp. 504- 509 ,(2006) , 10.1145/1150402.1150459