Top-kqueries on uncertain data

作者: Tingjian Ge , Stan Zdonik , Samuel Madden

DOI: 10.1145/1559845.1559886

关键词:

摘要: Uncertain data arises in a number of domains, including integration and sensor networks. Top-k queries that rank results according to some user-defined score are an important tool for exploring large uncertain sets. As several recent papers have observed, the semantics top-k on can be ambiguous due tradeoffs between reporting high-scoring tuples with high probability being resulting set. In this paper, we demonstrate need present distribution vectors allow user choose along score-probability dimensions. One option would display complete all potential tuple vectors, but set is too compute. Instead, propose provide typical effectively sample distribution. We efficient algorithms compute these vectors. also extend scenario ties, which not dealt previous work area. Our includes systematic empirical study both real dataset synthetic datasets.

参考文章(22)
Jennifer Widom, Trio: A System for Integrated Management of Data, Accuracy, and Lineage conference on innovative data systems research. pp. 262- 276 ,(2004)
Cheqing Jin, Ke Yi, Lei Chen, Jeffrey Xu Yu, Xuemin Lin, Sliding-window top-k queries on uncertain streams very large data bases. ,vol. 1, pp. 301- 312 ,(2008) , 10.14778/1453856.1453892
Ihab F. Ilyas, George Beskales, Mohamed A. Soliman, A survey of top-k query processing techniques in relational database systems ACM Computing Surveys. ,vol. 40, pp. 11- ,(2008) , 10.1145/1391729.1391730
Ming Hua, Jian Pei, Wenjie Zhang, Xuemin Lin, Ranking queries on uncertain data Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD '08. pp. 673- 686 ,(2008) , 10.1145/1376616.1376685
R. Hassin, A. Tamir, Improved complexity bounds for location problems on the real line Operations Research Letters. ,vol. 10, pp. 395- 402 ,(1991) , 10.1016/0167-6377(91)90041-M
Christopher Re, Nilesh Dalvi, Dan Suciu, Efficient Top-k Query Evaluation on Probabilistic Data international conference on data engineering. pp. 886- 895 ,(2007) , 10.1109/ICDE.2007.367934
Sejoon Lim, Hari Balakrishnan, David Gifford, Samuel Madden, Daniela Rus, Stochastic motion planning and applications to traffic The International Journal of Robotics Research. ,vol. 30, pp. 699- 712 ,(2011) , 10.1177/0278364910386259
Nilesh Dalvi, Dan Suciu, Efficient query evaluation on probabilistic databases very large data bases. ,vol. 16, pp. 523- 544 ,(2004) , 10.1007/S00778-006-0004-3
Thomas M. Cover, Joy A. Thomas, Elements of information theory ,(1991)
Nesime Tatbul, Mark Buller, Reed Hoyt, Steve Mullen, Stan Zdonik, Confidence-based data management for personal area sensor networks Proceeedings of the 1st international workshop on Data management for sensor networks in conjunction with VLDB 2004 - DMSN '04. pp. 24- 31 ,(2004) , 10.1145/1052199.1052204