作者: Heiko Paulheim , Robert Meusel
DOI: 10.1007/S10994-015-5507-Y
关键词:
摘要: Outlier detection methods automatically identify instances that deviate from the majority of data. In this paper, we propose a novel approach for unsupervised outlier detection, which re-formulates problem in numerical data as set supervised regression learning problems. For each attribute, learn predictive model predicts values attribute all other attributes, and compute deviations between predictions actual values. From those deviations, derive both weight final score using weights. The weights help separating relevant attributes irrelevant ones, thus make well suitable discovering outliers otherwise masked high-dimensional An empirical evaluation shows our outperforms existing algorithms, is particularly robust datasets with many attributes. Furthermore, show if symbolic machine method used to solve individual problems, also capable generating concise explanations detected outliers.