Anomaly detection based on zero appearances in subspaces

作者: Guansong Pang

DOI: 10.4225/03/58B647D9A377B

关键词:

摘要: Anomaly detection is regarded as one of the most important tasks in data mining due to its wide application various domains, such finance, information security, healthcare and earth science. With advancements collection techniques, volume dimensionality anomaly sets increase explosively, diverse attribute types occur within these sets. Also, many sets, anomalies can be detected some attributes only, while other are irrelevant detection. All characteristics pose new challenges existing techniques. Motivated by this fact, research aims design an method which scale up large high dimensional data, able identify with different attributes, tolerates attributes. This thesis posits that instances low probabilities subspaces a set. So, random subset set, have higher having zero appearances than normal instances. Based on property, proposes novel called ZERO++ employs number detect anomalies. only detector based subspaces, far we know. It unique it works regions not occupied data; whereas methods work data. Utilising anti-monotone property: `if instance has subspace, must also containing subspace', show small needs considered effectively. efficient algorithm linear time complexity respect size dimensionality, effectively percentage relevant

参考文章(74)
Varun Chandola, Arindam Banerjee, Vipin Kumar, Anomaly detection: A survey ACM Computing Surveys. ,vol. 41, pp. 15- ,(2009) , 10.1145/1541880.1541882
Stephen D. Bay, Mark Schwabacher, Mining distance-based outliers in near linear time with randomization and a simple pruning rule knowledge discovery and data mining. pp. 29- 38 ,(2003) , 10.1145/956750.956758
Raymond T. Ng, Edwin M. Knorr, A unified notion of outliers: properties and computation knowledge discovery and data mining. pp. 219- 222 ,(1997)
Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim, Efficient algorithms for mining outliers from large data sets international conference on management of data. ,vol. 29, pp. 427- 438 ,(2000) , 10.1145/335191.335437
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, The WEKA data mining software ACM SIGKDD Explorations Newsletter. ,vol. 11, pp. 10- 18 ,(2009) , 10.1145/1656274.1656278
Shu Wu, Shengrui Wang, Information-Theoretic Outlier Detection for Large-Scale Categorical Data IEEE Transactions on Knowledge and Data Engineering. ,vol. 25, pp. 589- 602 ,(2013) , 10.1109/TKDE.2011.261
Huan Liu, Farhad Hussain, Chew Lim Tan, Manoranjan Dash, Discretization: An Enabling Technique Data Mining and Knowledge Discovery. ,vol. 6, pp. 393- 423 ,(2002) , 10.1023/A:1016304305535
Keith Noto, Carly Brodley, Donna Slonim, Anomaly Detection Using an Ensemble of Feature Models international conference on data mining. pp. 953- 958 ,(2010) , 10.1109/ICDM.2010.140
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander, LOF: identifying density-based local outliers international conference on management of data. ,vol. 29, pp. 93- 104 ,(2000) , 10.1145/335191.335388
Guanting Tang, James Bailey, Jian Pei, Guozhu Dong, Mining multidimensional contextual outliers from categorical relational data statistical and scientific database management. pp. 43- ,(2013) , 10.1145/2484838.2484883