A Latent Dirichlet Model for Unsupervised Entity Resolution

作者: Lise Getoor , Indrajit Bhattacharya

DOI:

关键词:

摘要: Entity resolution has received considerable attention in recent years. Given many references to underlying entities, the goal is predict which correspond same entity. We show how extend Latent Dirichlet Allocation model for this task and propose a probabilistic collective entity relational domains where are connected each other. Our approach differs from other recently proposed approaches that it a) generative, b) does not make pair-wise decisions c) captures relations between entities through hidden group variable. novel sampling algorithm unsupervised also takes into account. Additionally, we do assume domain of be known infer number data. demonstrate utility practicality our author two real-world bibliographic datasets. In addition, present preliminary results on characterizing conditions under information useful.

参考文章(35)
Jiawei Han, Yoonkyong Lee, AnHai Doan, Ying Lu, Object Matching for Information Integration: A Profiler-Based Approach. IIWeb. pp. 53- 58 ,(2003)
Dmitri V. Kalashnikov, Sharad Mehrotra, Zhaoqi Chen, Exploiting relationships for domain-independent data cleaning † siam international conference on data mining. pp. 262- 273 ,(2005)
Rohit Ananthakrishna, Surajit Chaudhuri, Venkatesh Ganti, Eliminating fuzzy duplicates in data warehouses very large data bases. pp. 586- 597 ,(2002) , 10.1016/B978-155860869-6/50058-5
Xin Li, Paul Morie, Dan Roth, Robust Reading: Identification and Tracing of Ambiguous Names north american chapter of the association for computational linguistics. pp. 17- 24 ,(2004) , 10.21236/ADA457894
Thomas Hofmann, Probabilistic latent semantic analysis uncertainty in artificial intelligence. ,vol. 15, pp. 289- 296 ,(1999)
Stephen E. Fienberg, William W. Cohen, Pradeep Ravikumar, A comparison of string distance metrics for name-matching tasks international joint conference on artificial intelligence. pp. 73- 78 ,(2003)
William W. Cohen, Pradeep Ravikumar, A hierarchical graphical model for record linkage uncertainty in artificial intelligence. pp. 454- 461 ,(2004) , 10.5555/1036843.1036898