Similarity Joins in Relational Database Systems

作者: Nikolaus Augsten , Michael H. Bhlen

DOI:

关键词:

摘要: State-of-the-art database systems manage and process a variety of complex objects, including strings trees. For such objects equality comparisons are often not meaningful must be replaced by similarity comparisons. This book describes the concepts techniques to incorporate into systems. We start out discussing properties trees, identify edit distance as de facto standard for comparing objects. Since is computationally expensive, token-based distances have been introduced speed up computations. The basic idea decompose sets tokens that can compared efficiently. Token-based used compute an approximation prune expensive calculations. A key observation when computing joins many object pairs, which computed, very different from each other. Filters exploit this property improve performance joins. filter preprocesses input data produces set candidate pairs. function evaluated on pairs only. describe essential query processing filters based lower upper bounds. token we prefix, size, positional partitioning filters, avoid computation small intersections needed since would too low. Table Contents: Preface / Acknowledgments Introduction Data Types Edit-Based Distances Token-Based Query Processing Techniques Token Equality Joins Conclusion Bibliography Authors' Biographies Index

参考文章(75)
Yasin N. Silva, Spencer S. Pearson, Jason A. Cheney, Database Similarity Join for Metric Spaces similarity search and applications. pp. 266- 279 ,(2013) , 10.1007/978-3-642-41062-8_27
Tatsuya Akutsu, Takeyuki Tamura, Daiji Fukagawa, Atsuhiro Takasu, Efficient Exponential Time Algorithms for Edit Distance between Unordered Trees Combinatorial Pattern Matching. pp. 360- 372 ,(2012) , 10.1007/978-3-642-31265-6_29
D. Buttler, A Short Survey of Document Structure Similarity Algorithms international conference on internet computing. pp. 3- 9 ,(2004)
Nikolaus Augsten, Michael Böhlen, Johann Gamper, Reducing the Integration of Public Administration Databases to Approximate Tree Matching electronic government. pp. 102- 107 ,(2004) , 10.1007/978-3-540-30078-6_17
Hiroshi Mamitsuka, Tatsuya Akutsu, Nobuhisa Ueda, Kiyoko F. Aoki, Yasushi Okuno, Minoru Kanehisa, Atsuko Yamaguchi, Efficient tree-matching methods for accurate carbohydrate database queries. Genome Informatics. ,vol. 14, pp. 134- 143 ,(2003) , 10.11234/GI1990.14.134
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Similarity Search in High Dimensions via Hashing very large data bases. pp. 518- 529 ,(1999)
Erkki Sutinen, Jorma Tarhio, Filtration with q-Samples in Approximate String Matching combinatorial pattern matching. pp. 50- 63 ,(1996) , 10.1007/3-540-61258-0_4
Philip N. Klein, Computing the Edit-Distance between Unrooted Ordered Trees european symposium on algorithms. pp. 91- 102 ,(1998) , 10.1007/3-540-68530-8_8