Resolving and merging duplicate records using machine learning

作者: Richard Glenn Morris , Xinchuan Zeng , David Randal Elkington

DOI:

关键词:

摘要: According to various embodiments of the present invention, an automated technique is implemented for resolving and merging fields accurately reliably, given a set duplicated records that represents same entity. In at least one embodiment, system uses machine learning (ML) method, train model from training data, learn users how efficiently resolve merge fields. method invention builds feature vectors as input its ML method. apply Hierarchical Based Sequencing (HBS) and/or Multiple Output Relaxation (MOR) models in Training data can come any suitable source or combination sources.

参考文章(8)
Galen Hunt, Shobana Balakrishnan, Robert Fries, Virtual machine snapshotting and analysis ,(2012)
Pranam Kolari, Zhaohui Zheng, Anlei Dong, Yi Chang, Ruiqiang Zhang, Jing Bai, Ranking of search results based on microblog data ,(2010)
Xin-Jing Wang, Wei-Ying Ma, Lei Zhang, Building a person profile database ,(2010)
Hui Yang, Jamie Callan, Near-duplicate detection by instance-level constrained clustering Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 421- 428 ,(2006) , 10.1145/1148170.1148243
Hyunmo Kang, L. Getoor, B. Shneiderman, M. Bilgic, L. Licamele, Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation IEEE Transactions on Visualization and Computer Graphics. ,vol. 14, pp. 999- 1014 ,(2008) , 10.1109/TVCG.2008.55
Sheila Tejada, Craig A Knoblock, Steven Minton, Learning object identification rules for information integration Information Systems. ,vol. 26, pp. 607- 633 ,(2001) , 10.1016/S0306-4379(01)00042-4