An $\ell_p$ theory of PCA and spectral clustering.

作者: Emmanuel Abbe , Jianqing Fan , Kaizheng Wang

DOI:

关键词: MathematicsSpectral clusteringEigenvalues and eigenvectorsPrincipal component analysisNorm (mathematics)Gramian matrixHilbert spaceMixture modelGaussianDiscrete mathematics

摘要: Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery principal components their associated eigenvalues, there are few precise characterizations individual component scores that yield low-dimensional embedding samples. That hinders analysis various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for hollowed version Hilbert spaces which provably improves upon vanilla presence heteroscedastic noises. Through novel eigenvectors, investigate entrywise behaviors score vectors show they can be approximated by linear functionals Gram matrix norm, includes $\ell_2$ $\ell_\infty$ as special examples. For sub-Gaussian mixture models, choice $p$ giving optimal bounds depends signal-to-noise ratio, further yields optimality guarantees clustering. contextual community detection, leads to simple algorithm achieves information threshold exact recovery. These also provide results Gaussian stochastic block models cases.

参考文章(88)
Stéphane Boucheron, Gábor Lugosi, Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence ,(2013)
Ji-guang Sun, G. W. Stewart, Matrix perturbation theory ,(1990)
Iain M. Johnstone, On the distribution of the largest eigenvalue in principal components analysis Annals of Statistics. ,vol. 29, pp. 295- 327 ,(2001) , 10.1214/AOS/1009210544
Bernhard Schölkopf, Alexander Smola, Klaus-Robert Müller, Kernel Principal Component Analysis international conference on artificial neural networks. pp. 583- 588 ,(1997) , 10.1007/BFB0020217
Jürgen Gärtner, On Large Deviations from the Invariant Measure Theory of Probability & Its Applications. ,vol. 22, pp. 24- 39 ,(1977) , 10.1137/1122003
Chandler Davis, W. M. Kahan, The Rotation of Eigenvectors by a Perturbation. III SIAM Journal on Numerical Analysis. ,vol. 7, pp. 1- 46 ,(1970) , 10.1137/0707001
Santosh Vempala, Grant Wang, A spectral algorithm for learning mixture models foundations of computer science. ,vol. 68, pp. 841- 860 ,(2004) , 10.1016/J.JCSS.2003.11.008
N. Aronszajn, Theory of Reproducing Kernels. Transactions of the American Mathematical Society. ,vol. 68, pp. 337- 404 ,(1950) , 10.1090/S0002-9947-1950-0051437-7