作者: Yongjoo Park , Michael Cafarella , Barzan Mozafari
DOI: 10.1109/ICDE.2016.7498287
关键词: Data mining 、 Sampling (statistics) 、 Computer science 、 Database 、 Scatter plot 、 Cluster analysis 、 Density estimation 、 Visualization 、 Stratified sampling 、 Set (abstract data type)
摘要: Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating interactive timescales is increasingly challenging. One approach for improving speed visualization tool via reduction order to reduce computational overhead, but at a potential cost accuracy. Common techniques, such as uniform stratified sampling, do not exploit fact that sampled tuples will be transformed into human consumption. We propose visualization-aware sampling (VAS) guarantees high quality small subset entire dataset. validate our method when applied scatter map plots three common goals: regression, density estimation, clustering. The key method's success choosing set minimizes visualization-inspired loss function. While existing approaches minimize error aggregation queries, we focus on function maximizes visual fidelity plots. Our user study confirms proposed correlates strongly using resulting visualizations. experiments show (i) VAS improves user's by up 35% various tasks, (ii) can achieve required 400× faster.