Entity matching for intelligent information integration

作者： Hsinchun Chen , Gang Wang

DOI:

关键词: Information system 、 Heuristic 、 Decision rule 、 Artificial intelligence 、 Naive Bayes classifier 、 Machine learning 、 Information integration 、 Computer science 、 Matching (statistics) 、 Probabilistic logic 、 Data mining 、 Feature selection

摘要: Due to the rapid development of information technologies, especially network business activities have never been as integrated they are now. Business decision making often requires gathering from different sources. This dissertation focuses on problem entity matching, associating corresponding elements within or across systems. It is devoted providing complete and accurate for making. Three challenges identified that may affect matching performance: feature selection representative, techniques, searching strategy. first provides a theoretical foundation by connecting similarity categorization theories developed in field cognitive science. The provide guidance tackling three identified. First, based contrast model, we propose case-study-based methodology identifies key features uniquely identify an entity. Second, record comparison technique multi-layer naive Bayes model correspond respectively deterministic probability response models defined theory. Experiments show both techniques effective linking deceptive criminal identities. However, probabilistic preferable because it uses semi-supervised learning method, which less human intervention during training. Third, prototype access assumption proposed theory, apply adaptive detection algorithm so efficiency can be greatly improved reduced search space. this significantly improves without significant accuracy loss. Based above findings Arizona IDMatcher, identity system method. We compare against IBM Identity Resolution tool, leading commercial product using heuristic rules. do not suggest clear winner, but pros cons each system. IDMatcher able capture more true matches than (i.e., high recall). On other hand, mostly precision).

openrepository.com 本地加速

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(94)

William E. Winkler, The State of Record Linkage and Current Research Problems ,(1999)

Terry A. Landers, Ronni Rosenberg, An Overview of MULTIBASE. DDB. pp. 153- 184 ,(1982)

S. Obeng-Manu Gyimah, Missing Data in Quantitative Social Research Western University. ,vol. 15, pp. 1- ,(2001)

S. F. Buck, A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer Journal of the royal statistical society series b-methodological. ,vol. 22, pp. 302- 306 ,(1960) , 10.1111/J.2517-6161.1960.TB00375.X

Roger N. Shepard, Analysis of proximities as a technique for the study of information processing in man. Human Factors. ,vol. 5, pp. 33- 48 ,(1963) , 10.1177/001872086300500104

Roger Clarke, Human Identification in Information Systems Information Technology & People. ,vol. 7, pp. 6- 37 ,(1994) , 10.1108/09593849410076799

Won Kim, Byoung-Ju Choi, Eui-Kyeong Hong, Soo-Kyung Kim, Doheon Lee, A Taxonomy of Dirty Data Data Mining and Knowledge Discovery. ,vol. 7, pp. 81- 99 ,(2003) , 10.1023/A:1021564703268

Alvaro Edmundo Monge, Adaptive detection of approximately duplicate database records and the database integration approach to information discovery University of California at San Diego. ,(1998)

Aldert Vrij, Detecting Lies and Deceit: The Psychology of Lying and the Implications for Professional Practice ,(2000)

10.

Judith S Donath, Identity and deception in the virtual community Communities in Cyberspace. pp. 37- 68 ,(2002) , 10.4324/9780203194959-11

Entity matching for intelligent information integration

来源期刊

我的账户

Entity matching for intelligent information integration

来源期刊

相似文章 0

我的账户