Mining outlying aspects on numeric data

作者: Lei Duan , Guanting Tang , Jian Pei , James Bailey , Akiko Campbell

DOI: 10.1007/S10618-014-0398-2

关键词: Rank (computer programming)Artificial intelligenceSubspace topologyAnomaly detectionPattern recognitionData miningData setObject (computer science)MathematicsOutlierSynthetic dataSet (abstract data type)

摘要: When we are investigating an object in a data set, which itself may or not be outlier, can identify unusual (i.e., outlying) aspects of the object? In this paper, novel problem mining outlying on numeric data. Given query $$o$$o multidimensional set $$O$$O, subspace is most outlying? Technically, use rank probability density to measure outlyingness subspace. A minimal where ranked best aspect. Computing far from trivial. naive method has calculate densities all objects and them every subspace, very costly when dimensionality high. We systematically develop heuristic that capable searching sets with tens dimensions efficiently. Our empirical study using both real synthetic demonstrates our effective efficient.

参考文章(33)
Charu C. Aggarwal, Outlier Analysis ,(2013)
Vikram Pudi, Rohit Paravastu, Hanuma Kumar, Uniqueness mining database systems for advanced applications. pp. 84- 94 ,(2008)
Fabrizio Angiulli, Fabio Fassetti, Giuseppe Manco, Luigi Palopoli, Outlying Property Detection with Numerical Attributes arXiv: Learning. ,(2013)
Kevin Bache, Moshe Lichman, UCI Machine Learning Repository University of California, School of Information and Computer Science. ,(2007)
Hoang Vu Nguyen, Emmanuel Müller, Jilles Vreeken, Fabian Keller, Klemens Böhm, CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection siam international conference on data mining. pp. 198- 206 ,(2013) , 10.1137/1.9781611972832.22
Wolfgang Härdle, Axel Werwatz, Marlene Müller, Stefan Sperlich, Nonparametric and Semiparametric Models ,(2004)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Ron Rymon, Search through systematic set enumeration principles of knowledge representation and reasoning. pp. 539- 550 ,(1992)
Raymond T. Ng, Edwin M. Knorr, Finding Intensional Knowledge of Distance-Based Outliers very large data bases. pp. 211- 222 ,(1999)