YAWN: A Semantically Annotated Wikipedia XML Corpus

作者： Gjergji Kasneci , Fabian M. Suchanek , Ralf Schenkel

DOI:

关键词:

摘要: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms annotate pages links concepts from WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is high-quality, manually assigned source of information, extracts additional lists, utilizes invocations templates named parameters. give examples how such annotations can be exploited for high-precision queries.

mpg.de 本地加速

suchanek.name PDF 下载加速

参考文章(23)

A. Souzis, Building a semantic wiki IEEE Intelligent Systems. ,vol. 20, pp. 87- 91 ,(2005) , 10.1109/MIS.2005.83

Arvind Arasu, Hector Garcia-Molina, Stanford University, Extracting structured data from Web pages international conference on management of data. pp. 337- 348 ,(2003) , 10.1145/872757.872799

Ralf Schenkel, Anja Theobald, Gerhard Weikum, Semantic Similarity Search on Semistructured Data with the XXL Search Engine Information Retrieval. ,vol. 8, pp. 521- 545 ,(2005) , 10.1007/S10791-005-0746-3

Eugene Agichtein, Scaling Information Extraction to Large Document Collections. IEEE Data(base) Engineering Bulletin. ,vol. 28, pp. 3- 10 ,(2005)

Mounia Lalmas, Stefan M. Rüger, Anastasios Tombros, Theodora Tsikrika, Alexei Yavlinsky, Ralf Schenkel, Martin Theobald, Andy MacFarlane, Structural Feedback for Keyword-Based XML Retrieval Untitled Event. pp. 326- 337 ,(2006)

Gerhard Weikum, Jens Graupmann, Ralf Schenkel, The SphereSearch engine for unified ranked retrieval of heterogeneous XML and web documents very large data bases. pp. 529- 540 ,(2005)

Pavel Brazdil, Gerhard Weikum, George Tsatsaronis, Michalis Vazirgiannis, Luís Torgo, Rui Camacho, Martin Theobald, Alípio Jorge, Gama Joao, Dimitrios Mavroeidis, Word Sense Disambiguation for Exploiting Hierarchical Thesauri in Text Classification Untitled Event. pp. 181- 192 ,(2005)

B. Fazzinga, S. Flesca, A. Tagarelli, Learning Robust Web Wrappers Lecture Notes in Computer Science. pp. 736- 745 ,(2005) , 10.1007/11546924_72

Sihem Amer-Yahia, SungRan Cho, Divesh Srivastava, Tree Pattern Relaxation extending database technology. pp. 496- 513 ,(2002) , 10.1007/3-540-45876-X_32

10.

Andrew Trotman, Börkur Sigurbjörnsson, Narrowed extended XPath i (NEXI) INEX'04 Proceedings of the Third international conference on Initiative for the Evaluation of XML Retrieval. pp. 16- 40 ,(2004) , 10.1007/11424550_2

YAWN: A Semantically Annotated Wikipedia XML Corpus

来源期刊

我的账户

YAWN: A Semantically Annotated Wikipedia XML Corpus

来源期刊

相似文章 10

我的账户