作者: Trevor J. Hastie , Kenneth W. Church , Ping Li
DOI:
关键词: Hash function 、 Sketch 、 Sparse approximation 、 K-SVD 、 Sparse matrix 、 Randomized algorithm 、 Histogram 、 Computer science 、 Sampling (statistics) 、 Algorithm
摘要: We propose a sketch-based sampling algorithm, which effectively exploits the data sparsity. Sampling methods have become popular in large-scale mining and information retrieval, where high sparsity is norm. A distinct feature of our algorithm that it combines advantages both conventional random more modern randomized algorithms such as local sensitive hashing (LSH). While most are designed for specific summary statistics, proposed general purpose technique, useful estimating any statistics including two-way multi-way distances joint histograms.