作者: Hoang Vu Nguyen , Emmanuel Müller , Jilles Vreeken , Fabian Keller , Klemens Böhm
DOI: 10.1137/1.9781611972832.22
关键词:
摘要: In many real world applications data is collected in multi-dimensional spaces, with the knowledge hidden subspaces (i.e., subsets of dimensions). It an open research issue to select meaningful without any prior about such patterns. Standard approaches, as pairwise correlation measures, or statistical approaches based on entropy, do not solve this problem; due their restrictive analysis and loss information discretization they are bound miss potential clusters outliers. paper, we focus finding strong mutual dependency selected dimension set. Chosen should provide a high discrepancy between outliers enhance detection these To measure this, propose novel contrast score that quantifies correlations by considering cumulative distributions— having discretize data. our experiments, show enhanced quality cluster outlier for both synthetic