作者: Sergio Oramas , Luis Espinosa-Anke , Mohamed Sordo , Horacio Saggion , Xavier Serra
DOI: 10.1016/J.DATAK.2016.06.001
关键词: Information extraction 、 Semantic Web 、 Knowledge base 、 Natural language 、 Semantics 、 Cluster analysis 、 Information retrieval 、 Relationship extraction 、 Computer science 、 Entity linking
摘要: The rate at which information about music is being created and shared on the web growing exponentially. However, challenge of making sense all this data remains an open problem. In paper, we present evaluate Information Extraction pipeline aimed construction a Music Knowledge Base. Our approach starts off by collecting thousands stories songs from songfacts.com website. Then, combine state-of-the-art Entity Linking tool linguistically motivated rule-based algorithm to extract semantic relations between entity pairs. Next, with similar semantics are grouped into clusters exploiting syntactic dependencies. These ranked thanks novel confidence measure based statistical linguistic evidence. Evaluation carried out intrinsically, assessing each component pipeline, as well in extrinsic task, contribution natural language explanations recommendation. We demonstrate that our method able discover facts high precision, missing current generic music-specific knowledge repositories. A system constructs Base entirely scratch.A for clustering scoring Relation pipeline.Reveals absent repositories (e.g. Wikipedia).Explains recommendations language.