On speeding up the implementation of nearest neighbour search and classification

作者： Ivo Marinchev , Gennady Agre

关键词:

摘要: The paper presents practical approaches and techniques to speeding up implementations of nearest neighbour search/classification algorithm for high dimensional data and/or many training examples. Such settings often appear in the fields big mining. We apply a fast iterative form polar decomposition use computed matrix pre-select smaller number candidate classes query element. show that additional speed can be achieved when consists instances by subdividing them subclasses approximation some clustering resulting classification is used building matrix. Our pre-processing (depends linearly or near on examples dimensions) pre-selection steps classes) with any well-known indexing method as annulus method, kd-trees, metric trees, r-trees, cover etc limit process. Finally we introduce what name cluster index practice it extends applicability structures higher order complexity bigger datasets.

参考文章(22)

Stephen M. Omohundro, Efficient Algorithms with Neural Network Behavior. Complex Systems. ,vol. 1, ,(1987)

Levent Ertöz, Michael S. Steinbach, Vipin Kumar, Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. siam international conference on data mining. pp. 47- 58 ,(2003)

Richard E. Ladner, Kevin C. Zatloukal, Mary Holland Johnson, Nearest neighbor search for data compression. Data Structures, Near Neighbor Searches, and Methodology. pp. 69- 86 ,(1999)

Hans-Peter Kriegel, Peer Kröger, Erich Schubert, Arthur Zimek, Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data Advances in Knowledge Discovery and Data Mining. pp. 831- 838 ,(2009) , 10.1007/978-3-642-01307-2_86

Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, Arthur Zimek, Can shared-neighbor distances defeat the curse of dimensionality? statistical and scientific database management. pp. 482- 500 ,(2010) , 10.1007/978-3-642-13818-8_34

Alexander Hinneburg, Charu C. Aggarwal, Daniel A. Keim, What Is the Nearest Neighbor in High Dimensional Spaces very large data bases. pp. 506- 515 ,(2000)

Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft, When Is ''Nearest Neighbor'' Meaningful? international conference on database theory. pp. 217- 235 ,(1999) , 10.1007/3-540-49257-7_15

Kristin P. Bennett, Usama Fayyad, Dan Geiger, Density-based indexing for approximate nearest-neighbor queries knowledge discovery and data mining. pp. 233- 243 ,(1999) , 10.1145/312129.312236

Michael E. Houle, Navigating massive data sets via local clustering knowledge discovery and data mining. pp. 547- 552 ,(2003) , 10.1145/956750.956817

10.

Jerome H. Friedman, Jon Louis Bentley, Raphael Ari Finkel, An Algorithm for Finding Best Matches in Logarithmic Expected Time ACM Transactions on Mathematical Software. ,vol. 3, pp. 209- 226 ,(1977) , 10.1145/355744.355745

On speeding up the implementation of nearest neighbour search and classification

来源期刊

我的账户

On speeding up the implementation of nearest neighbour search and classification

来源期刊

相似文章 1

Privacy Preserving Big Data Publishing: Challenges, Techniques, and Architectures

我的账户