作者: Maurice van Keulen , Fabian Panse , Norbert Ritter
DOI:
关键词:
摘要: In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared duplicates or not. However, most often it not completely clear whether two represent the same real-world entity approaches, however, this uncertainty ignored, turn can lead to false decisions. paper, we present an indeterministic for handling uncertain decisions process by using probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds be modeled resulting data. This minimizes negative impacts Furthermore, becomes almost fully automatic and human effort reduced large extent. Unfortunately, full-indeterministic definition too expensive (in time well storage) hence impractical. For that reason, additionally introduce several semi-indeterministic methods heuristically reducing set handled meaningful way.