Learning-based fusion for data deduplication: A robust and automated solution

作者: Jared Dinerstein

DOI:

关键词: Active learning (machine learning)Computer scienceLearning basedFusionData miningSupport vector machineData deduplication

摘要:

参考文章(22)
Peter Christen, A two-step classification approach to unsupervised record linkage australasian data mining conference. pp. 111- 119 ,(2007)
D G Altman, J M Bland, Diagnostic tests. 1: Sensitivity and specificity. BMJ. ,vol. 308, pp. 1552- 1552 ,(1994) , 10.1136/BMJ.308.6943.1552
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)
Esko Ukkonen, Approximate string-matching with q-grams and maximal matches Theoretical Computer Science. ,vol. 92, pp. 191- 211 ,(1992) , 10.1016/0304-3975(92)90143-4
N. S. D'Andrea Du Bois, A Solution to the Problem of Linking Multivariate Documents Journal of the American Statistical Association. ,vol. 64, pp. 163- 174 ,(1969) , 10.1080/01621459.1969.10500961
H. B. Newcombe, J. M. Kennedy, S. J. Axford, A. P. James, Automatic linkage of vital records. Science. ,vol. 130, pp. 954- 959 ,(1959) , 10.1126/SCIENCE.130.3381.954
Ivan P. Fellegi, Alan B. Sunter, A Theory for Record Linkage Journal of the American Statistical Association. ,vol. 64, pp. 1183- 1210 ,(1969) , 10.1080/01621459.1969.10501049
T.F. Smith, M.S. Waterman, Identification of common molecular subsequences. Journal of Molecular Biology. ,vol. 147, pp. 195- 197 ,(1981) , 10.1016/0022-2836(81)90087-5
V.S. Verykios, P.G. Ipeirotis, A.K. Elmagarmid, Duplicate Record Detection: A Survey IEEE Transactions on Knowledge and Data Engineering. ,vol. 19, pp. 1- 16 ,(2007) , 10.1109/TKDE.2007.9