Less is More: Building Selective Anomaly Ensembles

作者: Shebuti Rayana , Leman Akoglu

DOI: 10.1145/2890508

关键词:

摘要: Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches classification clustering have studied effectively used long. Our work taps into this gap builds a new approach detection, with application event temporal graphs as well outlier no-graph settings. It handles combines multiple heterogeneous detectors yield improved robust performance. Importantly, trusting results from all constituent may deteriorate overall performance ensemble, some could provide inaccurate depending on type data hand underlying assumptions detector. This suggests that combining selectively is key building effective ensembles—hence “less more”.In paper we propose novel called SELECT which automatically systematically selects combine fully unsupervised fashion. We apply our method multi-dimensional point (no-graph), where successfully utilizes five base seven consensus methods under unified framework. extensive quantitative evaluation real-world datasets (four events), including Enron email communications, RealityMining SMS phone call records, New York Times news corpus, World Cup 2014 Twitter feed. also UCI Machine Learning Repository. Thanks its selection mechanism, yields superior compared individual alone, full (naively results), an existing diversity-based weighted approach.

参考文章(59)
David H. Wolpert, Original Contribution: Stacked generalization Neural Networks. ,vol. 5, pp. 241- 259 ,(1992) , 10.1016/S0893-6080(05)80023-1
Arthur Zimek, Hans-Peter Kriegel, Erich Schubert, Peer Kröger, Interpreting and Unifying Outlier Scores siam international conference on data mining. pp. 13- 24 ,(2011)
Diane Lambert, Zero-inflacted Poisson regression, with an application to defects in manufacturing Quality Engineering. ,vol. 37, pp. 563- 564 ,(1992)
Giorgio Valentini, Francesco Masulli, Ensembles of Learning Machines italian workshop on neural nets. ,vol. 2486, pp. 3- 22 ,(2002) , 10.1007/3-540-45808-5_1
Thomas G. Dietterich, Ensemble Methods in Machine Learning Multiple Classifier Systems. pp. 1- 15 ,(2000) , 10.1007/3-540-45014-9_1
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava, Data fusion: resolving conflicts from multiple sources web-age information management. pp. 64- 76 ,(2013) , 10.1007/978-3-642-38562-9_7
Chris T. Volinsky, Adrian E. Raftery, David Madigan, Jennifer A. Hoeting, Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors Statistical Science. ,vol. 14, pp. 382- 417 ,(1999) , 10.1214/SS/1009212519
Quang H. Vuong, Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses Econometrica. ,vol. 57, pp. 307- 333 ,(1989) , 10.2307/1912557
Hoang Vu Nguyen, Hock Hee Ang, Vivekanand Gopalkrishnan, Mining outliers with ensemble of heterogeneous detectors on random subspaces database systems for advanced applications. pp. 368- 383 ,(2010) , 10.1007/978-3-642-12026-8_29
Alexandre Klementiev, Dan Roth, Kevin Small, An Unsupervised Learning Algorithm for Rank Aggregation european conference on machine learning. pp. 616- 623 ,(2007) , 10.1007/978-3-540-74958-5_60