Record Linkage: Current Practice and Future Directions

作者: Lifang Gu , Deanne Vickers , Chris Rainsford , Rohan Baxter

DOI:

关键词:

摘要: Record linkage is the task of quickly and accurately identifying records corresponding to same entity from one or more data sources. also known as cleaning, reconciliation identification merge/purge problem. This paper presents “standard” probabilistic record model associated algorithm. Recent work in information retrieval, federated database systems mining have proposed alternatives key components standard The impact these on approach are assessed. question whether how new better terms time, accuracy degree automation for a particular application.

参考文章(81)
Alvaro E. Monge, Matching Algorithms within a Duplicate Detection System. IEEE Data(base) Engineering Bulletin. ,vol. 23, pp. 14- 20 ,(2000)
Don X. Sun, José C. Pinheiro, Methods for linking and mining massive heterogeneous databases knowledge discovery and data mining. pp. 309- 313 ,(1998)
Monica Scannapieco, Paola Bertolazzi, Luca De Santis, Automatic Record Matching in Cooperative Information Systems ,(2002)
J. A. Hylton, Identifying and Merging Related Bibliographic Records Massachusetts Institute of Technology. ,(1996)
Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, Cristian-Augustin Saita, Declarative Data Cleaning: Language, Model, and Algorithms very large data bases. pp. 371- 380 ,(2001)
L. M. Olson, J. M. Dean, L. J. Cook, Probabilistic record linkage: relationships between file sizes, identifiers and match weights. Methods of Information in Medicine. ,vol. 40, pp. 196- 203 ,(2001) , 10.1055/S-0038-1634155
William Cohen, Jacob Richman Φ, Learning to Match and Cluster Entity Names ,(2001)
Fereidoon Sadri, Laks V. S. Lakshmanan, Iyer N. Subramanian, SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems very large data bases. pp. 239- 250 ,(1996)