作者: Emmanuel Abbe , Jianqing Fan , Kaizheng Wang
DOI:
关键词: Mathematics 、 Spectral clustering 、 Eigenvalues and eigenvectors 、 Principal component analysis 、 Norm (mathematics) 、 Gramian matrix 、 Hilbert space 、 Mixture model 、 Gaussian 、 Discrete mathematics
摘要: Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery principal components their associated eigenvalues, there are few precise characterizations individual component scores that yield low-dimensional embedding samples. That hinders analysis various spectral methods. In this paper, we first develop an $\ell_p$ perturbation theory for hollowed version Hilbert spaces which provably improves upon vanilla presence heteroscedastic noises. Through novel eigenvectors, investigate entrywise behaviors score vectors show they can be approximated by linear functionals Gram matrix norm, includes $\ell_2$ $\ell_\infty$ as special examples. For sub-Gaussian mixture models, choice $p$ giving optimal bounds depends signal-to-noise ratio, further yields optimality guarantees clustering. contextual community detection, leads to simple algorithm achieves information threshold exact recovery. These also provide results Gaussian stochastic block models cases.