Efficient Query Evaluation on Probabilistic XML Data

作者: P. Stapersma

DOI:

关键词: Information retrievalSargableComputer scienceViewData mappingQuery optimizationQuery by ExampleQuery languageUncertain dataXPathData mining

摘要: In many application scenarios, reliability and accuracy of data are great importance. Data is often uncertain or inconsistent because the exact state represented real world objects unknown. A number models have emerged to cope with imperfect in order guarantee a level accuracy. These include probabilistic XML (P-XML) –an semi-structured model– U-Rel table-structured model. used by MayBMS, an relational database management system (URDBMS) that provides scalable query evaluation. contrast U-Rel, there does not exist efficient evaluation mechanism for P-XML. In this thesis, we approach problem instructing MayBMS P-XML evaluate XPath queries on as SQL data. This entails two aspects: (1) mapping from ensures same information instances both structures, (2) question specified languages. We present specification corresponding mapping. Additionally, designs specification. The first design constructs such way traditional second differs sense component evaluated part process. offers advantage more efficient. allows optimizations affect performance However, process burdened extra task evaluating component. An extensive experimental synthetically generated sets real-world shows our implementation most scenarios. Not only executed efficient, also improved

参考文章(41)
Jennifer Widom, Trio: A System for Integrated Management of Data, Accuracy, and Lineage conference on innovative data systems research. pp. 262- 276 ,(2004)
Sunil Prabhakar, Reynold Cheng, Sarvjeet Singh, U-DBMS: a database system for managing constantly-evolving data very large data bases. pp. 1271- 1274 ,(2005)
Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)
Edward Hung, Lise Getoor, V. S. Subrahmanian, Probabilistic Interval XML international conference on database theory. pp. 361- 377 ,(2003) , 10.1007/3-540-36285-1_24
Maurice van Keulen, Fabian Panse, Norbert Ritter, Indeterministic Handling of Uncertain Decisions in Duplicate Detection CTIT technical report series. ,(2010)
Torsten Grust, Maurice van Keulen, Tree Awareness for Relational DBMS Kernels : Staircase Join Lecture Notes in Computer Science. ,vol. 2818, pp. 231- 245 ,(2003) , 10.1007/978-3-540-45194-5_16
Reynold Cheng, Querying and cleaning uncertain data QuaCon'09 Proceedings of the 1st international conference on Quality of context. pp. 41- 52 ,(2009) , 10.1007/978-3-642-04559-2_4
Tomasz Imieliński, Witold Lipski, Incomplete Information in Relational Databases Journal of the ACM. ,vol. 31, pp. 761- 791 ,(1984) , 10.1145/1634.1886
Pierre Senellart, Asma Souihli, ProApproX Proceedings of the 2011 international conference on Management of data - SIGMOD '11. pp. 1295- 1298 ,(2011) , 10.1145/1989323.1989480
Jiewen Huang, Lyublena Antova, Christoph Koch, Dan Olteanu, MayBMS: a probabilistic database management system international conference on management of data. pp. 1071- 1074 ,(2009) , 10.1145/1559845.1559984