A Scalable Framework for Quality Assessment of RDF Datasets

作者： Anisa Rula , Jens Lehmann , Hajira Jabeen , Gezim Sejdiu

DOI:

关键词:

摘要: Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as integration, search, interlinking, cannot take full advantage of if it is low quality. There exist a few approaches for quality assessment Data, but their performance degrades with increase in size quickly grows beyond capabilities single machine. In this paper, present DistQualityAssessment -- an open source implementation large RDF that can scale out cluster machines. This first distributed, in-memory approach computing different metrics using Apache Spark. We also provide pattern used generate new scalable applied big data. The work presented here integrated SANSA framework been at least three use cases community. results show our generic, efficient, compared previously proposed approaches.

arxiv.org 本地加速

arxiv.org PDF 下载加速

参考文章(21)

Li Cai, Yangyong Zhu, The Challenges of Data Quality and Data Quality Assessment in the Big Data Era Data Science Journal. ,vol. 14, pp. 2- ,(2015) , 10.5334/DSJ-2015-002

Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer, LinkedGeoData: A core for a web of spatial open data Social Work. ,vol. 3, pp. 333- 354 ,(2012) , 10.3233/SW-2011-0052

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia Social Work. ,vol. 6, pp. 167- 195 ,(2015) , 10.3233/SW-140134

Sören Auer, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Amrapali Zaveri, Introduction to Linked Data and Its Lifecycle on the Web Reasoning Web. Semantic Technologies for Intelligent Data Access. pp. 1- 90 ,(2013) , 10.1007/978-3-642-39784-4_1

Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Sören Auer, Quality assessment for Linked Data: A Survey Social Work. ,vol. 7, pp. 63- 93 ,(2015) , 10.3233/SW-150175

Christian Bizer, Andreas Schultz, The Berlin SPARQL benchmark International Journal on Semantic Web and Information Systems. ,vol. 5, pp. 1- 24 ,(2009) , 10.4018/JSWIS.2009040101

Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, Amrapali Zaveri, Test-driven evaluation of linked data quality the web conference. pp. 747- 758 ,(2014) , 10.1145/2566486.2568002

Carlo Batini, Anisa Rula, Monica Scannapieco, Gianluigi Viscusi, From Data Quality to Big Data Quality Journal of Database Management. ,vol. 26, pp. 60- 82 ,(2015) , 10.4018/JDM.2015010103

Dhana Rao, Venkat N Gudivada, Vijay V. Raghavan, Data quality issues in big data 2015 IEEE International Conference on Big Data (Big Data). pp. 2654- 2660 ,(2015) , 10.1109/BIGDATA.2015.7364065

10.

David Becker, Trish Dunn King, Bill McMullen, Big data, big data quality problem 2015 IEEE International Conference on Big Data (Big Data). pp. 2644- 2653 ,(2015) , 10.1109/BIGDATA.2015.7364064

A Scalable Framework for Quality Assessment of RDF Datasets

来源期刊

我的账户

A Scalable Framework for Quality Assessment of RDF Datasets

来源期刊

相似文章 0

我的账户