Sparse Coresets for SVD on Infinite Streams

作者： Vladimir Braverman , Daniela Rus , Harry Lang , Dan Feldman , Adiel Statman

DOI:

关键词:

摘要: In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $\mathbb{R}^d$. An $\epsilon$-coreset is (much smaller) whose sum square distances the to any hyperplane approximates that original $1 \pm \epsilon$ factor. Our main result we can maintain while storing only $O(d \log^2 d / \epsilon^2)$ rows. Known lower bounds $\Omega(d show this nearly optimal. Moreover, each row our coreset weighted subset input This highly desirable since it: (1) preserves sparsity; (2) easily interpretable; (3) avoids precision errors; (4) applies problems with constraints on input. Previous results for SVD return required \log^3 n where $n$ number seen so far. algorithm, storage independent $n$, first uses finite memory streams. We support findings experiments Wikipedia dataset benchmarked against state-of-the-art algorithms.

参考文章(11)

Michael B. Cohen, Jelani Nelson, David P. Woodruff, Optimal approximate matrix product in terms of stable rank arXiv: Data Structures and Algorithms. ,(2015)

Mikhail Volkov, Daniela Rus, Dan Feldman, Dimensionality Reduction of Massive Sparse Datasets Using Coresets neural information processing systems. ,vol. 29, pp. 2766- 2774 ,(2016)

Dan Feldman, Michael Langberg, A unified framework for approximating and clustering data symposium on the theory of computing. pp. 569- 578 ,(2011) , 10.1145/1993636.1993712

Christian Sohler, David P. Woodruff, Dan Feldman, Morteza Monemizadeh, Coresets and sketches for high dimensional subspace approximation problems symposium on discrete algorithms. pp. 630- 649 ,(2010) , 10.5555/1873601.1873654

Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu, Dimensionality Reduction for k-Means Clustering and Low Rank Approximation symposium on the theory of computing. pp. 163- 172 ,(2015) , 10.1145/2746539.2746569

Christian Sohler, Melanie Schmidt, Dan Feldman, Turning big data into tiny data: constant-size coresets for k-means, PCA and projective clustering symposium on discrete algorithms. pp. 1434- 1453 ,(2013) , 10.5555/2627817.2627920

Malik Magdon-Ismail, Petros Drineas, Michael W. Mahoney, David P. Woodruff, Fast approximation of matrix coherence and statistical leverage Journal of Machine Learning Research. ,vol. 13, pp. 3475- 3506 ,(2012) , 10.5555/2503308.2503352

Dan Feldman, Alaa Maalouf, Adiel Statman, Tight Sensitivity Bounds For Smaller Coresets arXiv: Learning. ,(2019)

Michael B. Cohen, Richard Peng, Lp Row Sampling by Lewis Weights symposium on the theory of computing. pp. 183- 192 ,(2015) , 10.1145/2746539.2746567

10.

Vladimir Braverman, Daniela Rus, Harry Lang, Dan Feldman, Streaming Coreset Constructions for M-Estimators. international workshop and international workshop on approximation, randomization, and combinatorial optimization. algorithms and techniques. ,(2019)

Sparse Coresets for SVD on Infinite Streams

来源期刊

我的账户

Sparse Coresets for SVD on Infinite Streams

来源期刊

相似文章 0

我的账户