Synopses for probabilistic data over large domains

作者: Nicholas D. Larusso , Ambuj Singh

DOI: 10.1145/1951365.1951412

关键词:

摘要: Many real world applications produce data with uncertainties drawn from measurements over a continuous domain space. Recent research in the area of probabilistic databases has mainly focused on managing and querying discrete which is limited to small number values (i.e. order 10). When size increases, current methods fail due their nature explicitly storing each value/probability pair. Such are not capable extending use continuous-valued attributes. In this paper, we provide scalable, accurate, space efficient synopsis for uncertain attributes defined domain. Our construction all error-aware ensure that our provides an accurate representation underlying given budget. Additionally, able approximate query results error bounds.We extensive experimental evaluation show proposed improve upon state art terms time accuracy. particular, can be constructed O(N2) (where N tuples database). We also demonstrate ability answer variety interesting queries set reduced by up magnitude previous state-of-the-art method.

参考文章(25)
C. B. Dunham, Remez algorithm for Chebyshev approximation with interpolation Computing. ,vol. 28, pp. 75- 78 ,(1982) , 10.1007/BF02237998
Brooks Moses, Sheila McIlraith, Uri Lerner, Daphne Koller, Maricia Scott, Monitoring a complex physical system using a hybrid dynamic bayes net uncertainty in artificial intelligence. pp. 301- 310 ,(2002)
F. Korn, T. Johnson, H.V. Jagadish, Range selectivity estimation for continuous attributes statistical and scientific database management. pp. 244- 253 ,(1999) , 10.1109/SSDM.1999.787640
Panagiotis Karras, Dimitris Sacharidis, Nikos Mamoulis, Exploiting duality in summarization with deterministic guarantees Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '07. pp. 380- 389 ,(2007) , 10.1145/1281192.1281235
Yuhan Cai, Raymond Ng, Indexing spatio-temporal trajectories with Chebyshev polynomials international conference on management of data. pp. 599- 610 ,(2004) , 10.1145/1007568.1007636
William H. E. Day, Herbert Edelsbrunner, Efficient algorithms for agglomerative hierarchical clustering methods Journal of Classification. ,vol. 1, pp. 7- 24 ,(1984) , 10.1007/BF01890115
Christopher Re, Nilesh Dalvi, Dan Suciu, Efficient Top-k Query Evaluation on Probabilistic Data international conference on data engineering. pp. 886- 895 ,(2007) , 10.1109/ICDE.2007.367934
L. Veidinger, On the numerical determination of the best approximations in the Chebyshev sense Numerische Mathematik. ,vol. 2, pp. 99- 105 ,(1960) , 10.1007/BF01386215
Theodore J. Rivlin, The Chebyshev polynomials ,(1974)
Anthony Ralston, Rational chebyshev approximation by Remes' algorithms Numerische Mathematik. ,vol. 7, pp. 322- 330 ,(1965) , 10.1007/BF01436526