作者: Gjergji Kasneci , Fabian M. Suchanek , Ralf Schenkel
DOI:
关键词:
摘要: The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms annotate pages links concepts from WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is high-quality, manually assigned source of information, extracts additional lists, utilizes invocations templates named parameters. give examples how such annotations can be exploited for high-precision queries.