SLOPE { Adaptive Variable Selection via Convex Optimization

作者: Małgorzata Bogdan , Ewout van den Berg , Chiara Sabatti , Weijie Su , Emmanuel J. Candès

DOI: 10.1214/15-AOAS842

关键词: QuantileCombinatoricsNormal distributionLinear modelLasso (statistics)Linear regressionConvex optimizationFalse discovery rateEstimatorMathematicsStatistics

摘要: We introduce a new estimator for the vector of coefficients β in linear model y = Xβ + z, where X has dimensions n × p with possibly larger than n. SLOPE, short Sorted L-One Penalized Estimation, is solution to [Formula: see text]where λ1 ≥ λ2 … λ 0 and text] are decreasing absolute values entries b. This convex program we demonstrate algorithm whose computational complexity roughly comparable that classical l1 procedures such as Lasso. Here, regularizer sorted norm, which penalizes regression according their rank: higher rank-that is, stronger signal-the penalty. similar Benjamini Hochberg [J. Roy. Statist. Soc. Ser. B57 (1995) 289-300] procedure (BH) compares more significant p-values stringent thresholds. One notable choice sequence {λ i } given by BH critical text], q ∈ (0, 1) z(α) quantile standard normal distribution. SLOPE aims provide finite sample guarantees on selected model; special interest false discovery rate (FDR), defined expected proportion irrelevant regressors among all predictors. Under orthogonal designs, λBH provably controls FDR at level q. Moreover, it also appears have appreciable inferential properties under general designs while having substantial power, demonstrated series experiments running both simulated real data.

参考文章(53)
Cohn L. Mallows, More Comments onCp Technometrics. ,vol. 37, pp. 362- 372 ,(1995) , 10.1080/00401706.1995.10484370
Felix Abramovich, Yoav Benjamini, Thresholding of Wavelet Coefficients as Multiple Hypotheses Testing Procedure Wavelets and Statistics. pp. 5- 14 ,(1995) , 10.1007/978-1-4612-2544-7_1
Daniel Yekutieli, Yoav Benjamini, THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY Annals of Statistics. ,vol. 29, pp. 1165- 1188 ,(2001) , 10.1214/AOS/1013699998
Sanat K. Sarkar, Some Results on False Discovery Rate in Stepwise multiple testing procedures Annals of Statistics. ,vol. 30, pp. 239- 257 ,(2002) , 10.1214/AOS/1015362192
Felix Ruhaltinger, Florian Frommlet, Piotr Twarog, Malgorzata Bogdan, A model selection approach to genome wide association studies arXiv: Applications. ,(2010)
Robert Tibshirani, Trevor Hastie, Max Grazier G'Sell, False Variable Selection Rates in Regression arXiv: Methodology. ,(2013)
Jan de Leeuw, Kurt Hornik, Patrick Mair, Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods Journal of Statistical Software. ,vol. 32, pp. 1- 24 ,(2009) , 10.18637/JSS.V032.I05
I. Vincze, R. E. Barlow, D. J. Bartholomew, J. M. Bremner, H. D. Brunk, Statistical Inference under Order Restrictions (The Theory and Application of Isotonic Regression) International Statistical Review / Revue Internationale de Statistique. ,vol. 41, pp. 395- ,(1973) , 10.2307/1402630
J. B. Kruskal, Nonmetric multidimensional scaling: A numerical method Psychometrika. ,vol. 29, pp. 115- 129 ,(1964) , 10.1007/BF02289694
Nicolas Städler, Peter Bühlmann, Sara van de Geer, Rejoinder: ℓ 1 -penalization for mixture regression models Test. ,vol. 19, pp. 280- 285 ,(2010) , 10.1007/S11749-010-0203-5