作者: Kenji Yamanishi , Jun-Ichi Takeuchi , Graham Williams , Peter Milne
关键词: One-class classification 、 Mixture model 、 Outlier 、 Categorical variable 、 Artificial intelligence 、 Statistical learning theory 、 Anomaly detection 、 Data mining 、 Machine learning 、 Statistical model 、 Intrusion detection system 、 Unsupervised learning 、 Algorithm 、 Computer science
摘要: Outlier detection is a fundamental issue in data mining, specifically fraud detection, network intrusion monitoring, etc. SmartSifter an outlier engine addressing this problem from the viewpoint of statistical learning theory. This paper provides theoretical basis for and empirically demonstrates its effectiveness. detects outliers on-line process through unsupervised probabilistic model (using finite mixture model) information source. Each time datum input employs discounting algorithm to learn model. A score given based on learned with high indicating possibility being outlier. The novel features are: (1) it adaptive non-stationary sources data; (2) has clear statistical/information-theoretic meaning; (3) computationally inexpensive; (4) can handle both categorical continuous variables. An experimental application shows that was able identify scores corresponded attacks, low computational costs. Further identified number meaningful rare cases actual health insurance pathology Australia's Health Insurance Commission.