Sparse Coresets for SVD on Infinite Streams

作者: Vladimir Braverman , Daniela Rus , Harry Lang , Dan Feldman , Adiel Statman

DOI:

关键词:

摘要: In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $\mathbb{R}^d$. An $\epsilon$-coreset is (much smaller) whose sum square distances the to any hyperplane approximates that original $1 \pm \epsilon$ factor. Our main result we can maintain while storing only $O(d \log^2 d / \epsilon^2)$ rows. Known lower bounds $\Omega(d show this nearly optimal. Moreover, each row our coreset weighted subset input This highly desirable since it: (1) preserves sparsity; (2) easily interpretable; (3) avoids precision errors; (4) applies problems with constraints on input. Previous results for SVD return required \log^3 n where $n$ number seen so far. algorithm, storage independent $n$, first uses finite memory streams. We support findings experiments Wikipedia dataset benchmarked against state-of-the-art algorithms.

参考文章(11)
Michael B. Cohen, Jelani Nelson, David P. Woodruff, Optimal approximate matrix product in terms of stable rank arXiv: Data Structures and Algorithms. ,(2015)
Mikhail Volkov, Daniela Rus, Dan Feldman, Dimensionality Reduction of Massive Sparse Datasets Using Coresets neural information processing systems. ,vol. 29, pp. 2766- 2774 ,(2016)
Dan Feldman, Michael Langberg, A unified framework for approximating and clustering data symposium on the theory of computing. pp. 569- 578 ,(2011) , 10.1145/1993636.1993712
Christian Sohler, David P. Woodruff, Dan Feldman, Morteza Monemizadeh, Coresets and sketches for high dimensional subspace approximation problems symposium on discrete algorithms. pp. 630- 649 ,(2010) , 10.5555/1873601.1873654
Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu, Dimensionality Reduction for k-Means Clustering and Low Rank Approximation symposium on the theory of computing. pp. 163- 172 ,(2015) , 10.1145/2746539.2746569
Christian Sohler, Melanie Schmidt, Dan Feldman, Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering symposium on discrete algorithms. pp. 1434- 1453 ,(2013) , 10.5555/2627817.2627920
Malik Magdon-Ismail, Petros Drineas, Michael W. Mahoney, David P. Woodruff, Fast approximation of matrix coherence and statistical leverage Journal of Machine Learning Research. ,vol. 13, pp. 3475- 3506 ,(2012) , 10.5555/2503308.2503352
Dan Feldman, Alaa Maalouf, Adiel Statman, Tight Sensitivity Bounds For Smaller Coresets arXiv: Learning. ,(2019)
Michael B. Cohen, Richard Peng, Lp Row Sampling by Lewis Weights symposium on the theory of computing. pp. 183- 192 ,(2015) , 10.1145/2746539.2746567
Vladimir Braverman, Daniela Rus, Harry Lang, Dan Feldman, Streaming Coreset Constructions for M-Estimators. international workshop and international workshop on approximation, randomization, and combinatorial optimization. algorithms and techniques. ,(2019)