作者: Vladimir Braverman , Daniela Rus , Harry Lang , Dan Feldman , Adiel Statman
DOI:
关键词:
摘要: In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $\mathbb{R}^d$. An $\epsilon$-coreset is (much smaller) whose sum square distances the to any hyperplane approximates that original $1 \pm \epsilon$ factor. Our main result we can maintain while storing only $O(d \log^2 d / \epsilon^2)$ rows. Known lower bounds $\Omega(d show this nearly optimal. Moreover, each row our coreset weighted subset input This highly desirable since it: (1) preserves sparsity; (2) easily interpretable; (3) avoids precision errors; (4) applies problems with constraints on input. Previous results for SVD return required \log^3 n where $n$ number seen so far. algorithm, storage independent $n$, first uses finite memory streams. We support findings experiments Wikipedia dataset benchmarked against state-of-the-art algorithms.