Fast hierarchical clustering and other applications of dynamic closest pairs

作者: David Eppstein

DOI: 10.1145/351827.351829

关键词: Data structureHeuristicsNearest-neighbor chain algorithmCombinatoricsHierarchical clusteringMatching (graph theory)Set (abstract data type)AlgorithmClosest pair of points problemQuadtreeMathematics

摘要: We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based a technique previously used by author Euclidean pairs, we show how to insert and delete objects an n-object set, maintaining pair, in O(n log2 n) time per update O(n) space. With quadratic space, can instead use quadtree-like achieve optimal bound, update. apply these hierarchical clustering, greedy matching, TSP heuristics, discuss other potential applications machine learning, Grobner bases, local improvement algorithms partition placement problems. Experiments our new methods be faster practice than heuristics.

参考文章(33)
Yossi Matias, Semi-dynamic Closest-pair Algorithms. canadian conference on computational geometry. pp. 264- 271 ,(1993)
Bruno Buchberger, Applications of Gro¨bner bases in non-linear computational geometry Proceedings of the International Symposium on Trends in Computer Algebra. ,vol. 14, pp. 52- 80 ,(1988)
Geoffrey H. Ball, David J. Hall, ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION Stanford Research Institute. ,(1965)
Michael J. Pazzani, Constructive Induction of Cartesian Product Attributes Springer, Boston, MA. pp. 341- 354 ,(1998) , 10.1007/978-1-4615-5725-8_21
Edward M. Reingold, Robert E. Tarjan, ON A GREEDY HEURISTIC FOR COMPLETE MATCHING SIAM Journal on Computing. ,vol. 10, pp. 676- 681 ,(1981) , 10.1137/0210050
Alan Frieze, Colin McDiarmid, Bruce Reed, Greedy matching on the line SIAM Journal on Computing. ,vol. 19, pp. 666- 672 ,(1990) , 10.1137/0219045
Richa Agarwala, Vineet Bafna, Martin Farach, Mike Paterson, Mikkel Thorup, On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics) SIAM Journal on Computing. ,vol. 28, pp. 1073- 1085 ,(1999) , 10.1137/S0097539795296334
H. D. Vinod, Patrick L. Odell, Benjamin S. Duran, Cluster Analysis: A Survey ,(1974)
Alessandro Giovini, Teo Mora, Gianfranco Niesi, Lorenzo Robbiano, Carlo Traverso, “One sugar cube, please” or selection strategies in the Buchberger algorithm international symposium on symbolic and algebraic computation. pp. 49- 54 ,(1991) , 10.1145/120694.120701
Joe H. Ward, Hierarchical Grouping to Optimize an Objective Function Journal of the American Statistical Association. ,vol. 58, pp. 236- 244 ,(1963) , 10.1080/01621459.1963.10500845