作者: Ivo Marinchev , Gennady Agre
关键词:
摘要: The paper presents practical approaches and techniques to speeding up implementations of nearest neighbour search/classification algorithm for high dimensional data and/or many training examples. Such settings often appear in the fields big mining. We apply a fast iterative form polar decomposition use computed matrix pre-select smaller number candidate classes query element. show that additional speed can be achieved when consists instances by subdividing them subclasses approximation some clustering resulting classification is used building matrix. Our pre-processing (depends linearly or near on examples dimensions) pre-selection steps classes) with any well-known indexing method as annulus method, kd-trees, metric trees, r-trees, cover etc limit process. Finally we introduce what name cluster index practice it extends applicability structures higher order complexity bigger datasets.