A Template-Based Information Extraction from Web Sites with Unstable Markup

作者: Maxim Kolchin , Fedor Kozlov

DOI: 10.1007/978-3-319-12024-9_11

关键词:

摘要: This paper presents results of a work on crawling CEUR Workshop proceedings(CEUR proceedings web site, URL: http://ceur-ws.org) site to Linked Open Data (LOD) dataset in the framework ESWC 2014 Semantic Publishing Challenge 2014(ESWC Challenge, http://2014.eswc-conferences.org/semantic-publishing-challenge). Our approach is based using an extensible template-dependent crawler and DBpedia for linking extracted entities, such as names universities countries.

参考文章(1)
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia Social Work. ,vol. 6, pp. 167- 195 ,(2015) , 10.3233/SW-140134