Clustering high dimensional dynamic data streams

作者: Vladimir Braverman , Christian Sohler , Gereon Frahling , Harry Lang , Lin F. Yang

DOI:

关键词: Dimension (vector space)Bounded functionDynamic dataAlgorithmStreaming algorithmCluster analysisMetric spaceCoresetExponential functionComputer scienceEuclidean space

摘要: We present data streaming algorithms for the k-median problem in high-dimensional dynamic geometric streams, i.e. streams allowing both insertions and deletions of points from a discrete Euclidean space {1, 2,... Δ}d. Our use ke-2poly(d log Δ) space/time maintain with high probability small weighted set (a coreset) such that every k centers cost coreset (1 + e)-approximates streamed point set. also provide guarantee only positive weights additional logarithmic factors time complexities. can this positively-weighted to compute e)-approximation by any efficient offline algorithm. All previous computing over required exponential d. be generalized metric spaces bounded doubling dimension.

参考文章(26)
Moses Charikar, Kevin Chen, Martin Farach-Colton, Finding Frequent Items in Data Streams international colloquium on automata languages and programming. ,vol. 312, pp. 693- 703 ,(2002) , 10.1016/S0304-3975(03)00400-6
Sumit Ganguly, Counting Distinct Items over Update Streams Algorithms and Computation. pp. 505- 514 ,(2005) , 10.1007/11602613_51
Piotr Indyk, Rajeev Motwani, High-dimensional computational geometry ,(2000)
Zaoxing Liu, Nikita Ivkin, Lin Yang, Mark Neyrinck, Gerard Lemson, Alexander Szalay, Vladimir Braverman, Tamas Budavari, Randal Burns, Xin Wang, Streaming Algorithms for Halo Finders international conference on e-science. pp. 342- 351 ,(2015) , 10.1109/ESCIENCE.2015.73
Sariel Har-Peled, Akash Kushal, Smaller coresets for k-median and k-means clustering symposium on computational geometry. pp. 126- 134 ,(2005) , 10.1145/1064092.1064114
Dan Feldman, Michael Langberg, A unified framework for approximating and clustering data symposium on the theory of computing. pp. 569- 578 ,(2011) , 10.1145/1993636.1993712
Joan Feigenbaum, Sampath Kannan, Jian Zhang, Computing Diameter in the Streaming and Sliding-WindowModels Algorithmica. ,vol. 41, pp. 25- 41 ,(2005) , 10.1007/S00453-004-1105-2
Piotr Indyk, Better algorithms for high-dimensional proximity problems via asymmetric embeddings symposium on discrete algorithms. pp. 539- 545 ,(2003) , 10.5555/644108.644200
Piotr Indyk, Eric Price, K-median clustering, model-based compressive sensing, and sparse recovery for earth mover distance symposium on the theory of computing. pp. 627- 636 ,(2011) , 10.1145/1993636.1993720