i-HUMO: An Interactive Human and Machine Cooperation Framework for Entity Resolution with Quality Guarantees.

作者: Qun Chen , Zhanhuai Li , Youcef Nafa , Boyi Hou , Zhaoqiang Chen

DOI:

关键词: Risk analysis (business)Process (engineering)Selection (linguistics)Quality (business)Control (management)Artificial intelligencePrecision and recallMachine learningComputer science

摘要: Even though many approaches have been proposed for entity resolution (ER), it remains very challenging to find one with quality guarantees. To this end, we propose an interactive HUman and Machine cOoperation framework ER, denoted by i-HUMO. Similar the existing HUMO framework, i-HUMO enforces both precision recall levels dividing ER workload between human machine. It essentially makes machine label easy instances while assigning more human. However, is a major improvement over in that interactive: its process of selection optimized based on real-time risk analysis human-labeled results as well pre-specified metrics. In paper, first introduce then present technique prioritize manual labeling. Finally, empirically evaluate i-HUMO's performance real data. Our extensive experiments show effective enforcing guarantees, compared state-of-the-art alternatives, can achieve better control reduced cost.

参考文章(39)
Christopher K I Williams, Carl Edward Rasmussen, Gaussian Processes for Machine Learning ,(2005)
Wenfei Fan, Xibei Jia, Jianzhong Li, Shuai Ma, Reasoning about record matching rules Proceedings of the VLDB Endowment. ,vol. 2, pp. 407- 418 ,(2009) , 10.14778/1687627.1687674
Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani, None, SIGMa: simple greedy matching for aligning large knowledge bases knowledge discovery and data mining. pp. 572- 580 ,(2013) , 10.1145/2487575.2487592
Lingli Li, Jianzhong Li, Hong Gao, Rule-Based Method for Entity Resolution IEEE Transactions on Knowledge and Data Engineering. ,vol. 27, pp. 250- 263 ,(2015) , 10.1109/TKDE.2014.2320713
Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Nan Tang, Yin Ye, KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing international conference on management of data. pp. 1247- 1261 ,(2015) , 10.1145/2723372.2749431
Ju Fan, Guoliang Li, Beng Chin Ooi, Kian-lee Tan, Jianhua Feng, iCrowd: An Adaptive Crowdsourcing Framework international conference on management of data. pp. 1015- 1030 ,(2015) , 10.1145/2723372.2750550
Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin, NADEEF/ER: generic and interactive entity resolution international conference on management of data. pp. 1071- 1074 ,(2014) , 10.1145/2588555.2594511
Peter Christen, Automatic record linkage using seeded nearest neighbour and support vector machine classification Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 151- 159 ,(2008) , 10.1145/1401890.1401913
Lise Getoor, Ashwin Machanavajjhala, Entity resolution Proceedings of the VLDB Endowment. ,vol. 5, pp. 2018- 2019 ,(2012) , 10.14778/2367502.2367564