Indexing Methods for Faster and More Effective Person Name Search.

作者： Mark Arehart

DOI:

关键词:

摘要: This paper compares several indexing methods for person names extracted from text, developed an information retrieval system with requirements fast approximate matching of noisy and multicultural Romanized names. Such algorithms are computationally expensive unacceptably slow when used without or blocking step. The goal is to create a small candidate pool containing all the true matches that can be exhaustively searched by more effective but slower name comparison method. In addition dramatically faster search, some evaluated here led modest gains in effectiveness eliminating false positives. Four techniques using either phonetic keys substrings segments, segment stopword lists, were combined three algorithms. On test set 700 queries run against 70K names, best-performing technique took just 2.1% as long naive exhaustive search increased F1 3 points, showing appropriate increase both speed effectiveness.

uni-trier.de 本地加速

lrec-conf.org 本地加速

aclweb.org 本地加速

lrec-conf.org PDF 下载加速

uni-trier.de PDF 下载加速

参考文章(22)

W. W. Cohen and P. Ravikumar and S. Fienberg, A Comparison of String Metrics for Matching Names and Records ,(2003)

William E. Winkler, String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. ,(1990)

William E. Winkler, The State of Record Linkage and Current Research Problems ,(1999)

Peter Christen, Towards Parameter-free Blocking for Scalable Record Linkage Canberra, ACT: Dept. of Computer Science, Faculty of Engineering and Information Technology, The Australian National University. ,(2007)

Keith J. Miller, Mark Arehart, A Ground Truth Dataset for Matching Culturally Diverse Romanized Person Names language resources and evaluation. ,(2008)

Mark Arehart, Keith J. Miller, Elizabeth Schroeder, Kenneth Samuel, Vanesa Jurica, John Polk, James Finley, Sarah McLeod, Improving Personal Name Search in the TIGR System. language resources and evaluation. ,(2010)

Mark Arehart, Keith J. Miller, Chris Wolf, Adjudicator Agreement and System Rankings for Person Name Search language resources and evaluation. ,(2008)

Erkki Sutinen, Ricardo A. Baeza-Yates, Jorma Tarhio, Gonzalo Navarro, Indexing methods for approximate string matching IEEE Data(base) Engineering Bulletin. ,vol. 24, pp. 19- 27 ,(2001)

Norbert Fuhr, Thomas Poersch, Ulrich Pfeifer, Searching Proper Names in Databases. HIM. pp. 259- 275 ,(1995)

10.

Stephen E. Fienberg, William W. Cohen, Pradeep Ravikumar, A comparison of string distance metrics for name-matching tasks international joint conference on artificial intelligence. pp. 73- 78 ,(2003)

Indexing Methods for Faster and More Effective Person Name Search.

来源期刊

我的账户

Indexing Methods for Faster and More Effective Person Name Search.

来源期刊

相似文章 2

Improving Personal Name Search in the TIGR System.

Detecting referral and selection bias by the anonymous linkage of practice, hospital and clinic data using Secure and Private Record Linkage (SAPREL): case study from the evaluation of the Improved Access to Psychological Therapy (IAPT) service

我的账户