Semantically Annotated Snapshot of the English Wikipedia.

作者: Jordi Atserias , Giuseppe Attardi , Hugo Zaragoza , Massimiliano Ciaramita

DOI:

关键词: Computer scienceExplicit semantic analysisSnapshot (computer storage)DocumentationInformation retrievalWorld Wide Web

摘要: This paper describes SW1, the first version of a semantically annotated snapshot English Wikipedia. In recent years Wikipedia has become valuable resource for both Natural Language Processing (NLP) community and Information Retrieval (IR) community. Although NLP technology processing already exists, not all researchers developers have computational resources to process such volume information. Moreover, use different versions processed differently might make it difficult compare results. The aim this work is provide easy access syntactic semantic annotations IR communities by building reference corpus homogenize experiments results comparable. These resources, “entity containment” derived graph, are licensed under GNU Free Documentation License available from http://www.yr-bcn.es/semanticWikipedia

参考文章(8)
Felice Dell'Orletta, Giuseppe Attardi, Maria Simi, Massimiliano Ciaramita, Atanas Chanev, Multilingual Dependency Parsing and Domain Adaptation using DeSR empirical methods in natural language processing. pp. 1112- 1118 ,(2007)
Rada Mihalcea, Using Wikipedia for Automatic Word Sense Disambiguation north american chapter of the association for computational linguistics. pp. 196- 203 ,(2007)
Massimiliano Ciaramita, Yasemin Altun, Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger empirical methods in natural language processing. pp. 594- 602 ,(2006) , 10.3115/1610075.1610158
George A. Miller, Claudia Leacock, Randee Tengi, Ross T. Bunker, A semantic concordance Proceedings of the workshop on Human Language Technology - HLT '93. pp. 303- 308 ,(1993) , 10.3115/1075671.1075742
Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, Britta Schasberger, The Penn Treebank Proceedings of the workshop on Human Language Technology - HLT '94. pp. 114- 119 ,(1994) , 10.3115/1075812.1075835
Evgeniy Gabrilovich, Shaul Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis international joint conference on artificial intelligence. pp. 1606- 1611 ,(2007)
Erik F. Tjong Kim Sang, Fien De Meulder, Introduction to the CoNLL-2003 shared task: language-independent named entity recognition north american chapter of the association for computational linguistics. pp. 142- 147 ,(2003) , 10.3115/1119176.1119195
Hugo Zaragoza, Henning Rode, Peter Mika, Jordi Atserias, Massimiliano Ciaramita, Giuseppe Attardi, Ranking very many typed entities on wikipedia conference on information and knowledge management. pp. 1015- 1018 ,(2007) , 10.1145/1321440.1321599