Learning Name Variants from Inexact High-Confidence Matches

作者: Gerrit Bloothooft , Marijn Schraagen

DOI: 10.1007/978-3-319-19884-2_4

关键词:

摘要: Name variants which differ more than a few characters can seriously hamper record linkage. A method is described by of first names and surnames be learned automatically from records that contain information needed for true link decision. Post-processing limited manual intervention (active learning) unavoidable, however, to differentiate errors in the original digitised data variants. The demonstrated on basis an analysis 14.8 million Dutch vital registration.

参考文章(15)
Paul Bratley, Serge Lusignan, Information processing in dictionary making: Some technical guidelines Computers and the Humanities. ,vol. 10, pp. 133- 143 ,(1976) , 10.1007/BF02426299
Sunita Sarawagi, Anuradha Bhamidipaty, Interactive deduplication using active learning Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 269- 278 ,(2002) , 10.1145/775047.775087
Timothy de Vries, Hui Ke, Sanjay Chawla, Peter Christen, Robust record linkage blocking using suffix arrays conference on information and knowledge management. pp. 305- 314 ,(2009) , 10.1145/1645953.1645994
James L. Dolby, An Algorithm for Variable-Length Proper-Name Compression Journal of library automation. ,vol. 3, pp. 257- 275 ,(1970) , 10.6017/ITAL.V3I4.5259
Fredrik Olsson, A literature survey of active machine learning in the context of natural language processing Swedish Institute of Computer Science. ,(2009)
Indrajit Bhattacharya, Lise Getoor, Collective entity resolution in relational data ACM Transactions on Knowledge Discovery From Data. ,vol. 1, pp. 5- ,(2007) , 10.1145/1217299.1217304
Patricia Driscoll, Computational Methods for Name Normalization Using Hypocoristic Personal Name Variants Multi-source, Multilingual Information Extraction and Summarization. pp. 73- 91 ,(2013) , 10.1007/978-3-642-28569-1_4
Lawrence Philips, The double metaphone search algorithm The C Users Journal archive. ,vol. 18, pp. 38- 43 ,(2000)