Statistical properties of sketching algorithms

作者: William J. Astle , Daniel Ahfock , Sylvia Richardson

DOI:

关键词:

摘要: Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating smaller surrogate dataset. Typically, inference proceeds compressed generally use random projections to compress original dataset and stochastic generation process makes them amenable statistical analysis. We argue sketched modelled as sample, thus placing family of methods firmly within an inferential framework. In particular, we focus Gaussian, Hadamard Clarkson-Woodruff sketches, their single pass for linear regression with huge $n$. explore properties derive new distributional results large class estimators. A key result conditional central limit theorem oblivious sketches. An important finding best choice algorithm terms mean square error related signal noise ratio source Finally, demonstrate theory limits its applicability two real datasets.

参考文章(45)
Galen R. Shorack, Probability for Statisticians ,(2000)
Robert M. Gower, Peter Richtárik, Randomized Iterative Methods for Linear Systems arXiv: Numerical Analysis. ,(2015) , 10.1137/15M1025487
Garvesh Raskutti, Michael Mahoney, A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares arXiv: Machine Learning. ,(2014)
Michael W. Mahoney, Jiyan Yang, Xiangrui Meng, Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments arXiv: Distributed, Parallel, and Cluster Computing. ,(2015)
Persi Diaconis, David Freedman, Asymptotics of Graphical Projection Pursuit Annals of Statistics. ,vol. 12, pp. 793- 815 ,(1984) , 10.1214/AOS/1176346703
Leon Jay Gleser, Morris L. Eaton, Multivariate statistics : a vector space approach Journal of the American Statistical Association. ,vol. 80, pp. 1069- ,(1985) , 10.2307/2288585
Rajarshi Guhaniyogi, David B. Dunson, Bayesian Compressed Regression Journal of the American Statistical Association. ,vol. 110, pp. 1500- 1514 ,(2015) , 10.1080/01621459.2014.969425
Alexander R. Pruss, Dominik Szynal, On the central limit theorem for negatively correlated random variables with negatively correlated squares Stochastic Processes and their Applications. ,vol. 87, pp. 299- 309 ,(2000) , 10.1016/S0304-4149(99)00115-5