A decomposition of the outlier detection problem into a set of supervised learning problems

作者: Heiko Paulheim , Robert Meusel

DOI: 10.1007/S10994-015-5507-Y

关键词:

摘要: Outlier detection methods automatically identify instances that deviate from the majority of data. In this paper, we propose a novel approach for unsupervised outlier detection, which re-formulates problem in numerical data as set supervised regression learning problems. For each attribute, learn predictive model predicts values attribute all other attributes, and compute deviations between predictions actual values. From those deviations, derive both weight final score using weights. The weights help separating relevant attributes irrelevant ones, thus make well suitable discovering outliers otherwise masked high-dimensional An empirical evaluation shows our outperforms existing algorithms, is particularly robust datasets with many attributes. Furthermore, show if symbolic machine method used to solve individual problems, also capable generating concise explanations detected outliers.

参考文章(39)
Scientific and Statistical Database Management Lecture Notes in Computer Science. ,vol. 5566, ,(2009) , 10.1007/978-3-642-02279-1
Emmanuel Müller, Matthias Schiffer, Patrick Gerwert, Matthias Hannen, Timm Jansen, Thomas Seidl, SOREX: subspace outlier ranking exploration toolkit european conference on machine learning. pp. 607- 610 ,(2010) , 10.1007/978-3-642-15939-8_44
Tarek Abudawood, Peter Flach, Evaluation Measures for Multi-class Subgroup Discovery european conference on machine learning. pp. 35- 50 ,(2009) , 10.1007/978-3-642-04180-8_20
Simon Hawkins, Hongxing He, Graham Williams, Rohan Baxter, Outlier Detection Using Replicator Neural Networks data warehousing and knowledge discovery. pp. 170- 180 ,(2002) , 10.1007/3-540-46145-0_17
Kenji Yamanishi, Jun-ichi Takeuchi, Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner knowledge discovery and data mining. pp. 389- 394 ,(2001) , 10.1145/502512.502570
I. Vincze, R. E. Barlow, D. J. Bartholomew, J. M. Bremner, H. D. Brunk, Statistical Inference under Order Restrictions (The Theory and Application of Isotonic Regression) International Statistical Review / Revue Internationale de Statistique. ,vol. 41, pp. 395- ,(1973) , 10.2307/1402630
John H. Skillings, Gregory A. Mack, On the Use of a Friedman-Type Statistic in Balanced and Unbalanced Block Designs Technometrics. ,vol. 23, pp. 171- 177 ,(1981) , 10.1080/00401706.1981.10486261
Arthur Zimek, Matthew Gaudet, Ricardo J.G.B. Campello, Jörg Sander, Subsampling for efficient and effective unsupervised outlier detection ensembles knowledge discovery and data mining. pp. 428- 436 ,(2013) , 10.1145/2487575.2487676
Naoki Abe, Bianca Zadrozny, John Langford, Outlier detection by active learning knowledge discovery and data mining. pp. 504- 509 ,(2006) , 10.1145/1150402.1150459
Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou, Isolation-Based Anomaly Detection ACM Transactions on Knowledge Discovery from Data. ,vol. 6, pp. 1- 39 ,(2012) , 10.1145/2133360.2133363