Large-Scale Chinese Cross-Document Entity Disambiguation and Information Fusion

作者: Xiaoge Li , Sugang Ma , Xiaohui Zhou

DOI: 10.1007/978-3-319-10596-3_9

关键词: Pipeline (software)Entity–relationship modelGraph (abstract data type)Artificial intelligenceNatural language processingModular designInformation extractionComputer scienceAliasKnowledge baseScale (map)

摘要: Cross-document entity disambiguation is the problem of identifying whether mentions from different documents refer to same or distinct entities and rises in information fusion automated knowledge base construction. In this paper, we describe a Chinese Information Extraction (IE) system based on Hadoop Framework, which involves document-level IE corpus-level IE, pipeline multi-level modular approach Name Entity Recognitions (EDR), relationship extraction fusion. associated with each mention name can be merged into rich profiles for our co-reference alias modular, performed agglomerative hierarchical clustering using Map Reduce. The visualized results centric graph have been demonstrated.

参考文章(24)
Silviu Cucerzan, Large-Scale Named Entity Disambiguation Based on Wikipedia Data empirical methods in natural language processing. pp. 708- 716 ,(2007)
James Martin, Ying Chen, Towards Robust Unsupervised Personal Name Disambiguation empirical methods in natural language processing. pp. 190- 198 ,(2007)
Ralph Grishman, Andrew Eliot Borthwick, A maximum entropy approach to named entity recognition Ph. D. Thesis New York University. ,(1999)
Daniel M. Bikel, Richard Schwartz, Ralph M. Weischedel, An Algorithm that Learns What‘s in a Name Machine Learning. ,vol. 34, pp. 211- 231 ,(1999) , 10.1023/A:1007558221122
Mark Dredze, Tim Finin, Adam Gerber, Delip Rao, Paul McNamee, Entity Disambiguation for Knowledge Base Population international conference on computational linguistics. pp. 277- 285 ,(2010)
Javier Artiles, Satoshi Sekine, Julio Gonzalo, Web people search Proceeding of the 17th international conference on World Wide Web - WWW '08. pp. 1071- 1072 ,(2008) , 10.1145/1367497.1367661
A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review ACM Computing Surveys. ,vol. 31, pp. 264- 323 ,(1999) , 10.1145/331499.331504
Qi Li, Sam Anzaroot, Wen-Pin Lin, Xiang Li, Heng Ji, Joint inference for cross-document information extraction Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11. pp. 2225- 2228 ,(2011) , 10.1145/2063576.2063932
Wei Li, Andrew McCallum, Rapid development of Hindi named entity recognition using conditional random fields and feature induction ACM Transactions on Asian Language Information Processing. ,vol. 2, pp. 290- 294 ,(2003) , 10.1145/979872.979879
Kisung Lee, Ling Liu, None, Efficient data partitioning model for heterogeneous graphs in the cloud ieee international conference on high performance computing data and analytics. pp. 46- ,(2013) , 10.1145/2503210.2503302