Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection

作者: Peter Christen

DOI:

关键词:

摘要: Data matching (also known as record or data linkage, entity resolution, object identification, field matching) is the task of identifying, and merging records that correspond to same entities from several databases even within one database. Based on research in various domains including applied statistics, health informatics, mining, machine learning, artificial intelligence, database management, digital libraries, significant advances have been achieved over last decade all aspects process, especially how improve accuracy matching, its scalability large databases. Peter Christens book divided into three parts: Part I, Overview, introduces subject by presenting sample applications their special challenges, well a general overview generic process. II, Steps Matching Process, then details main steps like pre-processing, indexing, comparison, classification, quality evaluation. Lastly, part III, Further Topics, deals with specific privacy, real-time unstructured data. Finally, it briefly describes features many open source systems available today. By providing reader broad range concepts techniques touching this helps researchers students specializing familiarize themselves recent identify challenges area matching. To end, each chapter includes final section provides pointers further background material. Practitioners will better understand current state art internal workings limitations systems. Especially, they learn often not feasible simply implement an existing off-the-shelf system without substantial adaption customization. Such practical considerations are discussed for major

参考文章(0)