Streaming, Memory-limited PCA

作者: Ioannis Mitliagkas , Constantine Caramanis , Prateek Jain

DOI:

关键词:

摘要: In this paper, we consider a streaming one-pass-over-the-data model for Principal Component Analysis (PCA). The input, in case, is stream ofp-dimensional vectors, and the output collection of k, p-dimensional principal components that span best approximating subspace. Consequently, minimum memory requirement such problems O(kp). Yet standard PCA algorithm requires us to form empirical covariance matrix, typically dense p hence requiring O(p 2 ) memory. Although there exist several incremental algorithms require O(kp) memory, our understanding, these methods currently do not have known nite-sample performance bounds. That is, high-dimensional setting where number samples dimensionality scale together, no provably correct algorithm. This paper considers simple but important problem. We give what knowledge, rst only makes single pass over data, whose matches batch up logarithmic factors.

参考文章(21)
Tracking the best linear predictor Journal of Machine Learning Research. ,vol. 1, pp. 281- 309 ,(2001) , 10.1162/153244301753683726
Laura Balzano, John C. S. Lui, Jun He, Online Robust Subspace Tracking from Partial Information arXiv: Information Theory. ,(2011)
Herbert Robbins, Sutton Monro, A Stochastic Approximation Method Annals of Mathematical Statistics. ,vol. 22, pp. 400- 407 ,(1951) , 10.1214/AOMS/1177729586
Erkki Oja, Juha Karhunen, On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix Journal of Mathematical Analysis and Applications. ,vol. 106, pp. 69- 84 ,(1985) , 10.1016/0022-247X(85)90131-3
Kenneth L. Clarkson, David P. Woodruff, Numerical linear algebra in the streaming model Proceedings of the 41st annual ACM symposium on Symposium on theory of computing - STOC '09. pp. 205- 214 ,(2009) , 10.1145/1536414.1536445
Hongyuan Zha, Horst D. Simon, On Updating Problems in Latent Semantic Indexing SIAM Journal on Scientific Computing. ,vol. 21, pp. 782- 791 ,(1999) , 10.1137/S1064827597329266
Roman Vershynin, How Close is the Sample Covariance Matrix to the Actual Covariance Matrix Journal of Theoretical Probability. ,vol. 25, pp. 655- 686 ,(2012) , 10.1007/S10959-010-0338-Z
Sam T. Roweis, EM Algorithms for PCA and SPCA neural information processing systems. ,vol. 10, pp. 626- 632 ,(1997)
Matthew Brand, None, Incremental Singular Value Decomposition of Uncertain Data with Missing Values european conference on computer vision. pp. 707- 720 ,(2002) , 10.1007/3-540-47969-4_47
Haitao Zhao, Pong Chi Yuen, J.T. Kwok, A novel incremental principal component analysis and its application for face recognition systems man and cybernetics. ,vol. 36, pp. 873- 886 ,(2006) , 10.1109/TSMCB.2006.870645