Visualization-aware sampling for very large databases

作者： Yongjoo Park , Michael Cafarella , Barzan Mozafari

DOI: 10.1109/ICDE.2016.7498287

关键词: Data mining 、 Sampling (statistics) 、 Computer science 、 Database 、 Scatter plot 、 Cluster analysis 、 Density estimation 、 Visualization 、 Stratified sampling 、 Set (abstract data type)

摘要: Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating interactive timescales is increasingly challenging. One approach for improving speed visualization tool via reduction order to reduce computational overhead, but at a potential cost accuracy. Common techniques, such as uniform stratified sampling, do not exploit fact that sampled tuples will be transformed into human consumption. We propose visualization-aware sampling (VAS) guarantees high quality small subset entire dataset. validate our method when applied scatter map plots three common goals: regression, density estimation, clustering. The key method's success choosing set minimizes visualization-inspired loss function. While existing approaches minimize error aggregation queries, we focus on function maximizes visual fidelity plots. Our user study confirms proposed correlates strongly using resulting visualizations. experiments show (i) VAS improves user's by up 35% various tasks, (ii) can achieve required 400× faster.

yongjoopark.com PDF 下载加速

doi.org LINK 下载加速

ieee.org LINK 下载加速

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(43)

Xing Xie, Wei-Ying Ma, Yu Zheng, GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory. IEEE Data(base) Engineering Bulletin. ,vol. 33, pp. 32- 39 ,(2010)

Albert Kim, Eric Blais, Aditya Parameswaran, Piotr Indyk, Sam Madden, Ronitt Rubinfeld, Rapid sampling for visualizations with ordering guarantees Proceedings of the VLDB Endowment. ,vol. 8, pp. 521- 532 ,(2015) , 10.14778/2735479.2735485

G. L. Nemhauser, L. A. Wolsey, M. L. Fisher, An analysis of approximations for maximizing submodular set functions--I Mathematical Programming. ,vol. 14, pp. 265- 294 ,(1978) , 10.1007/BF01588971

Jeffrey Heer, Sean Kandel, Interactive analysis of big data ACM Crossroads Student Magazine. ,vol. 19, pp. 50- 54 ,(2012) , 10.1145/2331042.2331058

E. A. Nadaraya, On Estimating Regression Theory of Probability and Its Applications. ,vol. 9, pp. 141- 142 ,(1964) , 10.1137/1109020

Joseph Cottam, Andrew Lumsdaine, Peter Wang, Overplotting: Unified solutions under Abstract Rendering international conference on big data. pp. 9- 16 ,(2013) , 10.1109/BIGDATA.2013.6691712

Zhicheng Liu, Jeffrey Heer, The Effects of Interactive Latency on Exploratory Visual Analysis IEEE Transactions on Visualization and Computer Graphics. ,vol. 20, pp. 2122- 2131 ,(2014) , 10.1109/TVCG.2014.2346452

Mike Barnett, Badrish Chandramouli, Robert DeLine, Steven Drucker, Danyel Fisher, Jonathan Goldstein, Patrick Morrison, John Platt, Stat!: an interactive analytics environment for big data international conference on management of data. pp. 1013- 1016 ,(2013) , 10.1145/2463676.2463683

U. Feige, D. Peleg, G. Kortsarz, The Dense k -Subgraph Problem Algorithmica. ,vol. 29, pp. 410- 421 ,(2001) , 10.1007/S004530010050

10.

Brian Babcock, Surajit Chaudhuri, Gautam Das, Dynamic sample selection for approximate query processing international conference on management of data. pp. 539- 550 ,(2003) , 10.1145/872757.872822

Visualization-aware sampling for very large databases

来源期刊

我的账户

Visualization-aware sampling for very large databases

来源期刊

相似文章 10

我的账户