On b-bit min-wise hashing for large-scale regression and classification with sparse data

作者: Nicolai Meinshausen , Rajen D. Shah

DOI:

关键词:

摘要: Large-scale regression problems where both the number of variables, $p$, and observations, $n$, may be large in order millions or more, are becoming increasingly more common. Typically data sparse: only a fraction percent entries design matrix non-zero. Nevertheless, often computationally feasible approach is to perform dimension reduction obtain new with far fewer columns then work this compressed data. $b$-bit min-wise hashing (Li Konig, 2011) promising scheme for sparse matrices which produces set random features such that on resulting approximates kernel resemblance kernel. In work, we derive bounds prediction error regressions. For linear logistic models show average vanishes asymptotically as long $q \|\beta^*\|_2^2 /n \rightarrow 0$, $q$ non-zero each row $\beta^*$ coefficient predictor. We also ordinary least squares ridge applied reduced can fact allow us fit flexible models. We non-asymptotic interaction an unknown normalisation must signal predictors.

参考文章(51)
J. Michael Steele, The Cauchy-Schwarz Master Class ,(2004)
Tamas Sarlos, Alexander Smola, Quoc Le, Fastfood - Computing Hilbert Space Expansions in loglinear time international conference on machine learning. pp. 244- 252 ,(2013)
Santosh Vempala, The random projection method American Mathematical Society. ,(2005) , 10.1090/DIMACS/065
Peter Lukas Bühlmann, Boosting for high-dimensional linear models Annals of Statistics. ,vol. 34, pp. 559- 583 ,(2006) , 10.3929/ETHZ-A-004680132
Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, Tamás Sarlós, Faster least squares approximation Numerische Mathematik. ,vol. 117, pp. 219- 249 ,(2011) , 10.1007/S00211-010-0331-6
Hui Zou, Runze Li, One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. Annals of Statistics. ,vol. 36, pp. 1509- 1533 ,(2008) , 10.1214/009053607000000802
Ping Li, Trevor J. Hastie, Kenneth W. Church, Very sparse random projections knowledge discovery and data mining. pp. 287- 296 ,(2006) , 10.1145/1150402.1150436