Min-wise hashing for large-scale regression and classication with sparse data

作者: Nicolai Meinshausen , Rajen D. Shah

DOI:

关键词: Regression analysisSparse matrixRegressionOrdinary least squaresMathematicsEstimatorHash functionStatisticsAlgorithmContrast (statistics)Context (language use)

摘要: We study large-scale regression analysis where both the number of variables, p, and observations, n, may be large in order millions or more. This is very dierent from now well-studied high-dimensional context \large small n". For example, our n" setting, an ordinary least squares estimator inappropriate for computational, rather than statistical, reasons. In to make progress, one must seek a compromise between statistical computational eciency. Furthermore, contrast common assumption signal sparsity data, here it design matrices that are typically sparse applications. Our approach dealing with this large, data based on b-bit min-wise hashing

参考文章(48)
J. Michael Steele, The Cauchy-Schwarz Master Class ,(2004)
Stéphane Boucheron, Gábor Lugosi, Pascal Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence ,(2013)
Ata Kabán, Robert J. Durrant, Dimension-adaptive bounds on compressive FLD Classification algorithmic learning theory. pp. 294- 308 ,(2013) , 10.1007/978-3-642-40935-6_21
Mayur Datar, S. Muthukrishnan, Estimating Rarity and Similarity over Data Stream Windows european symposium on algorithms. pp. 323- 334 ,(2002) , 10.1007/3-540-45749-6_31
Andrei Z. Broder, Moses Charikar, Alan M. Frieze, Michael Mitzenmacher, Min-wise independent permutations (extended abstract) symposium on the theory of computing. pp. 327- 336 ,(1998) , 10.1145/276698.276781
Peter Lukas Bühlmann, Boosting for high-dimensional linear models Annals of Statistics. ,vol. 34, pp. 559- 583 ,(2006) , 10.3929/ETHZ-A-004680132
Petros Drineas, Michael W. Mahoney, S. Muthukrishnan, Tamás Sarlós, Faster least squares approximation Numerische Mathematik. ,vol. 117, pp. 219- 249 ,(2011) , 10.1007/S00211-010-0331-6