Evaluation and model selection for unsupervised outlier detection and one-class classification

作者: Henrique Oliveira Marques

DOI:

关键词:

摘要: Outlier detection (or anomaly detection) plays an important role in the pattern discovery from data that can be considered exceptional in some sense. An important distinction is that between the supervised, semi-supervised and unsupervised techniques. In this work, we focus on semisupervised and unsupervised techniques. It has been shown that unsupervised outlier detection techniques can be adapted to be applicable also in the semi-supervised setting. Therefore, we conduct a comparative study between the semi-supervised techniques and unsupervised techniques adapted to the semi-supervised context. The main focus of this work, however, is on the unsupervised evaluation of outlier detection. Although there is a large and growing literature that tackles the outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature, especially in the context of unsupervised detection. The so-called internal evaluation, based solely on the data and the assessed solutions themselves, is required if one wants to statistically validate (in absolute terms) or just compare (in relative terms) the solutions provided by different algorithms or by different parameterizations of a given algorithm in the absence of labeled data. However, in contrast to cluster analysis, where indexes for internal evaluation and validation of clustering solutions have been conceived and shown to be very useful, in the outlier detection domain this problem has been notably overlooked. Here we discuss this problem and provide solutions for the internal evaluation of outlier detection results. In the scenario of semi-supervised …

参考文章(0)