High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence

作者: Pradeep Ravikumar , Martin J. Wainwright , Garvesh Raskutti , Bin Yu

DOI: 10.1214/11-EJS631

关键词: Norm (mathematics)EstimatorMultivariate random variableCombinatoricsMultivariate normal distributionCovarianceOperator normMathematicsEstimation of covariance matricesCovariance matrixStatistics

摘要: Given i.i.d. observations of a random vector X 2 R p , we study the problem estimating both its covariance matrix � ∗ and inverse or concentration = (� ) −1 . We estimate by minimizing an l1-penalized log-determinant Bregman divergence; in multivariate Gaussian case, this approach corresponds to maximum likelihood, structure is specified graph associated Markov field. analyze performance estim ator under high-dimensional scaling, which number nodes p, edges s node degree d, are allowed grow as function sample size n. In addition parameters (p, s, d), our analysis identifies other key quantities that control rates: (a) l∞-operator norm true ; (b) l∞ operator submatrix ∗, where S indexes edges, (c) mutual incoherence irrepresentability measure on (d) rate decay 1/f(n, δ) probabilities {|b n ∗| > δ}, b based samples. Our first result establishes consistency elementwise maximum-norm. This turn allows us derive convergence rates Frobenius spectral norms, with improvements upon existing results for graphs degrees d o( s). second result, show probability converging one, correctly specifies zero pattern illustrate theoretical via simulations various parameters, showing good correspondences between predictions behavior simulations. 1. Introduction. The area statistics deals estimation “large small n” setting, correspond, respectively, dimensionality dat size. Such problems arise variety applications, among them remote sensing, computational biology natural language processing, model dimension may be comparable substantially larger than It well-known such scaling can lead dramatic breakdowns many classical procedures. absence additional assumptions, it frequently impossible obtain consistent procedures when ≫ Accordingly, active line statistical research imposing restrictions model—-for instance, sparsity, manifold structure, graphical structure—-and then studying different estimators n, ambient related these structural assu mptions.

参考文章(33)
Yair Al Censor, Stavros A. Zenios, Parallel Optimization: Theory, Algorithms, and Applications ,(1997)
Iain M. Johnstone, On the distribution of the largest eigenvalue in principal components analysis Annals of Statistics. ,vol. 29, pp. 295- 327 ,(2001) , 10.1214/AOS/1009210544
Haskell P. Rosenthal, On the subspaces ofL p (p>2) spanned by sequences of independent random variables Israel Journal of Mathematics. ,vol. 8, pp. 273- 303 ,(1970) , 10.1007/BF02771562
Shuheng Zhou, John Lafferty, Larry Wasserman, Time varying undirected graphs Machine Learning. ,vol. 80, pp. 295- 319 ,(2010) , 10.1007/S10994-010-5180-0
Clifford Lam, Jianqing Fan, Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation Annals of Statistics. ,vol. 37, pp. 4254- 4278 ,(2009) , 10.1214/09-AOS720
Adam J. Rothman, Elizaveta Levina, Peter J. Bickel, Ji Zhu, Sparse permutation invariant covariance estimation Electronic Journal of Statistics. ,vol. 2, pp. 494- 515 ,(2008) , 10.1214/08-EJS176
Nicolai Meinshausen, Peter Bühlmann, High-dimensional graphs and variable selection with the Lasso Annals of Statistics. ,vol. 34, pp. 1436- 1462 ,(2006) , 10.1214/009053606000000281