A Scalable Framework for Quality Assessment of RDF Datasets

作者: Anisa Rula , Jens Lehmann , Hajira Jabeen , Gezim Sejdiu

DOI:

关键词:

摘要: Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as integration, search, interlinking, cannot take full advantage of if it is low quality. There exist a few approaches for quality assessment Data, but their performance degrades with increase in size quickly grows beyond capabilities single machine. In this paper, present DistQualityAssessment -- an open source implementation large RDF that can scale out cluster machines. This first distributed, in-memory approach computing different metrics using Apache Spark. We also provide pattern used generate new scalable applied big data. The work presented here integrated SANSA framework been at least three use cases community. results show our generic, efficient, compared previously proposed approaches.

参考文章(21)
Li Cai, Yangyong Zhu, The Challenges of Data Quality and Data Quality Assessment in the Big Data Era Data Science Journal. ,vol. 14, pp. 2- ,(2015) , 10.5334/DSJ-2015-002
Claus Stadler, Jens Lehmann, Konrad Höffner, Sören Auer, LinkedGeoData: A core for a web of spatial open data Social Work. ,vol. 3, pp. 333- 354 ,(2012) , 10.3233/SW-2011-0052
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, Christian Bizer, DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia Social Work. ,vol. 6, pp. 167- 195 ,(2015) , 10.3233/SW-140134
Sören Auer, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Amrapali Zaveri, Introduction to Linked Data and Its Lifecycle on the Web Reasoning Web. Semantic Technologies for Intelligent Data Access. pp. 1- 90 ,(2013) , 10.1007/978-3-642-39784-4_1
Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, Sören Auer, Quality assessment for Linked Data: A Survey Social Work. ,vol. 7, pp. 63- 93 ,(2015) , 10.3233/SW-150175
Christian Bizer, Andreas Schultz, The Berlin SPARQL benchmark International Journal on Semantic Web and Information Systems. ,vol. 5, pp. 1- 24 ,(2009) , 10.4018/JSWIS.2009040101
Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, Amrapali Zaveri, Test-driven evaluation of linked data quality the web conference. pp. 747- 758 ,(2014) , 10.1145/2566486.2568002
Carlo Batini, Anisa Rula, Monica Scannapieco, Gianluigi Viscusi, From Data Quality to Big Data Quality Journal of Database Management. ,vol. 26, pp. 60- 82 ,(2015) , 10.4018/JDM.2015010103
Dhana Rao, Venkat N Gudivada, Vijay V. Raghavan, Data quality issues in big data 2015 IEEE International Conference on Big Data (Big Data). pp. 2654- 2660 ,(2015) , 10.1109/BIGDATA.2015.7364065
David Becker, Trish Dunn King, Bill McMullen, Big data, big data quality problem 2015 IEEE International Conference on Big Data (Big Data). pp. 2644- 2653 ,(2015) , 10.1109/BIGDATA.2015.7364064