作者: Anisa Rula , Jens Lehmann , Hajira Jabeen , Gezim Sejdiu
DOI:
关键词:
摘要: Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as integration, search, interlinking, cannot take full advantage of if it is low quality. There exist a few approaches for quality assessment Data, but their performance degrades with increase in size quickly grows beyond capabilities single machine. In this paper, present DistQualityAssessment -- an open source implementation large RDF that can scale out cluster machines. This first distributed, in-memory approach computing different metrics using Apache Spark. We also provide pattern used generate new scalable applied big data. The work presented here integrated SANSA framework been at least three use cases community. results show our generic, efficient, compared previously proposed approaches.