作者: Nikolaus Augsten , Michael H. Bhlen
DOI:
关键词:
摘要: State-of-the-art database systems manage and process a variety of complex objects, including strings trees. For such objects equality comparisons are often not meaningful must be replaced by similarity comparisons. This book describes the concepts techniques to incorporate into systems. We start out discussing properties trees, identify edit distance as de facto standard for comparing objects. Since is computationally expensive, token-based distances have been introduced speed up computations. The basic idea decompose sets tokens that can compared efficiently. Token-based used compute an approximation prune expensive calculations. A key observation when computing joins many object pairs, which computed, very different from each other. Filters exploit this property improve performance joins. filter preprocesses input data produces set candidate pairs. function evaluated on pairs only. describe essential query processing filters based lower upper bounds. token we prefix, size, positional partitioning filters, avoid computation small intersections needed since would too low. Table Contents: Preface / Acknowledgments Introduction Data Types Edit-Based Distances Token-Based Query Processing Techniques Token Equality Joins Conclusion Bibliography Authors' Biographies Index