Indeterministic Handling of Uncertain Decisions in Duplicate Detection

作者： Maurice van Keulen , Fabian Panse , Norbert Ritter

DOI:

关键词:

摘要: In current research, duplicate detection is usually considered as a deterministic approach in which tuples are either declared duplicates or not. However, most often it not completely clear whether two represent the same real-world entity approaches, however, this uncertainty ignored, turn can lead to false decisions. paper, we present an indeterministic for handling uncertain decisions process by using probabilistic target schema. Thus, instead of deciding between multiple possible worlds, all these worlds be modeled resulting data. This minimizes negative impacts Furthermore, becomes almost fully automatic and human effort reduced large extent. Unfortunately, full-indeterministic definition too expensive (in time well storage) hence impractical. For that reason, additionally introduce several semi-indeterministic methods heuristically reducing set handled meaningful way.

utwente.nl 本地加速

elsevierpure.com 本地加速

narcis.nl 本地加速

utwente.nl PDF 下载加速

参考文章(34)

Managing and Mining Uncertain Data mmud. ,vol. 35, pp. 1- 41 ,(2009) , 10.1007/978-0-387-09690-2

Carlo Batini, Monica Scannapieco, Data Quality: Concepts, Methodologies and Techniques ,(2006)

Carlo Batini, Monica Scannapieco, Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications) Springer-Verlag New York, Inc.. ,(2006)

William W. Cohen, Pradeep Ravikumar, A hierarchical graphical model for record linkage uncertainty in artificial intelligence. pp. 454- 461 ,(2004) , 10.5555/1036843.1036898

J.R. Wang, S.E. Madnick, The inter-database instance identification problem in integrating autonomous systems [1989] Proceedings. Fifth International Conference on Data Engineering. pp. 46- 55 ,(1989) , 10.1109/ICDE.1989.47199

Maurizio Lenzerini, Data integration: a theoretical perspective symposium on principles of database systems. pp. 233- 246 ,(2002) , 10.1145/543613.543644

Peter Buneman, Wang-Chiew Tan, Provenance in Databases ,(2009)

Mauricio A. Hernández, Salvatore J. Stolfo, The merge/purge problem for large databases international conference on management of data. ,vol. 24, pp. 127- 138 ,(1995) , 10.1145/223784.223807

Jiewen Huang, Lyublena Antova, Christoph Koch, Dan Olteanu, MayBMS: a probabilistic database management system international conference on management of data. pp. 1071- 1074 ,(2009) , 10.1145/1559845.1559984

10.

Alon Halevy, Michael Franklin, David Maier, Principles of dataspace systems symposium on principles of database systems. pp. 1- 9 ,(2006) , 10.1145/1142351.1142352

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

来源期刊

我的账户

Indeterministic Handling of Uncertain Decisions in Duplicate Detection

来源期刊

相似文章 4

Efficient Query Evaluation on Probabilistic XML Data

Managing Uncertainty: The Road Towards Better Data Interoperability

Managing Uncertainty: The Road Towards Better Data Interoperability Verwaltung von Unsicherheit: Der Weg zu besserer Interoperabilität

A complete solution for duplication detection over uncertain data

我的账户