Good and Bad Neighborhood Approximations for Outlier Detection Ensembles

作者: Evelyn Kirner , Erich Schubert , Arthur Zimek

DOI: 10.1007/978-3-319-68474-1_12

关键词:

摘要: Outlier detection methods have used approximate neighborhoods in filter-refinement approaches. ensembles artificially obfuscated to achieve diverse ensemble members. Here we argue that outlier models could be based on the first place, thus gaining both efficiency and effectiveness. It depends, however, type of approximation, as only some seem beneficial for task detection, while no (large) benefit can seen others. In particular, space-filling curves are approximations, they a stronger tendency underestimate density sparse regions than dense regions. comparison, LSH NN-Descent do not such construction ensembles.

参考文章(50)
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger, The R*-tree: an efficient and robust access method for points and rectangles international conference on management of data. ,vol. 19, pp. 322- 331 ,(1990) , 10.1145/93597.98741
G. Kollios, D. Gunopulos, N. Koudas, S. Berchtold, Efficient biased sampling for approximate clustering and outlier detection in large data sets IEEE Transactions on Knowledge and Data Engineering. ,vol. 15, pp. 1170- 1187 ,(2003) , 10.1109/TKDE.2003.1232271
Mayur Datar, Nicole Immorlica, Piotr Indyk, Vahab S. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions symposium on computational geometry. pp. 253- 262 ,(2004) , 10.1145/997817.997857
Jon Louis Bentley, Multidimensional binary search trees used for associative searching Communications of the ACM. ,vol. 18, pp. 509- 517 ,(1975) , 10.1145/361002.361007
F. Angiulli, C. Pizzuti, Outlier mining in large high-dimensional data sets IEEE Transactions on Knowledge and Data Engineering. ,vol. 17, pp. 203- 215 ,(2005) , 10.1109/TKDE.2005.31
Guilherme O. Campos, Arthur Zimek, Jörg Sander, Ricardo J. G. B. Campello, Barbora Micenková, Erich Schubert, Ira Assent, Michael E. Houle, On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study Data Mining and Knowledge Discovery. ,vol. 30, pp. 891- 927 ,(2016) , 10.1007/S10618-015-0444-8
Andreas Züfle, Klaus Arthur Schmid, Arthur Zimek, Erich Schubert, Alexander Koos, Tobias Emrich, A framework for clustering uncertain data Proceedings of the VLDB Endowment. ,vol. 8, pp. 1976- 1979 ,(2015) , 10.14778/2824032.2824115
Yasunobu Imamura, Takeshi Shinohara, Kouichi Hirata, Tetsuji Kuboyama, Fast Hilbert Sort Algorithm Without Using Hilbert Indices similarity search and applications. pp. 259- 267 ,(2016) , 10.1007/978-3-319-46759-7_20
Hans-Peter Kriegel, Erich Schubert, Arthur Zimek, The (black) art of runtime evaluation: Are we comparing algorithms or implementations? Knowledge and Information Systems. ,vol. 52, pp. 341- 378 ,(2017) , 10.1007/S10115-016-1004-2