作者: William J. Astle , Daniel Ahfock , Sylvia Richardson
DOI:
关键词:
摘要: Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating smaller surrogate dataset. Typically, inference proceeds compressed generally use random projections to compress original dataset and stochastic generation process makes them amenable statistical analysis. We argue sketched modelled as sample, thus placing family of methods firmly within an inferential framework. In particular, we focus Gaussian, Hadamard Clarkson-Woodruff sketches, their single pass for linear regression with huge $n$. explore properties derive new distributional results large class estimators. A key result conditional central limit theorem oblivious sketches. An important finding best choice algorithm terms mean square error related signal noise ratio source Finally, demonstrate theory limits its applicability two real datasets.