作者: Yasin N. Silva , Spencer S. Pearson , Jason A. Cheney
DOI: 10.1007/978-3-642-41062-8_27
关键词: Joins 、 Data mining 、 Data processing 、 Database 、 Information retrieval 、 Sort-merge join 、 Mathematics 、 Operator (computer programming) 、 Metric space 、 Hash join 、 Similarity (network science) 、 Join (sigma algebra)
摘要: Similarity Joins are recognized among the most useful data processing and analysis operations. They retrieve all pairs whose distances smaller than a predefined threshold e. While several standalone implementations have been proposed, very little work has addressed implementation of Join as physical database operator. In this paper, we focus on study, design operator for any dataset that lies in metric space DBSimJoin. We describe changes each query engine module to implement DBSimJoin provide details our PostgreSQL. The extensive performance evaluation shows significantly outperforms alternative approaches.