作者: Guansong Pang
关键词:
摘要: Anomaly detection is regarded as one of the most important tasks in data mining due to its wide application various domains, such finance, information security, healthcare and earth science. With advancements collection techniques, volume dimensionality anomaly sets increase explosively, diverse attribute types occur within these sets. Also, many sets, anomalies can be detected some attributes only, while other are irrelevant detection. All characteristics pose new challenges existing techniques. Motivated by this fact, research aims design an method which scale up large high dimensional data, able identify with different attributes, tolerates attributes. This thesis posits that instances low probabilities subspaces a set. So, random subset set, have higher having zero appearances than normal instances. Based on property, proposes novel called ZERO++ employs number detect anomalies. only detector based subspaces, far we know. It unique it works regions not occupied data; whereas methods work data. Utilising anti-monotone property: `if instance has subspace, must also containing subspace', show small needs considered effectively. efficient algorithm linear time complexity respect size dimensionality, effectively percentage relevant