Scale-Out Processing of Large RDF Datasets

作者: Long Cheng , Spyros Kotoulas

DOI: 10.1109/TBDATA.2015.2505719

关键词: Big dataData miningComputer scienceRDFDynamic dataDistributed databaseSPARQLCwmRDF query languageScalability

摘要: Distributed RDF data management systems become increasingly important with the growth of Semantic Web. Regardless, current methods meet performance bottlenecks either on loading or querying when processing large amounts data. In this work, we propose efficient for using dynamic re-partitioning to enable rapid analysis datasets. Our approach adopts a two-tier index architecture each computation node: (1) lightweight primary index, keep times low, and (2) series dynamic, multi-level secondary indexes, calculated as by-product query execution, decrease remove inter-machine movement subsequent queries that contain same graph patterns. addition, replace some indexes distributed filters, so memory consumption. Experimental results commodity cluster 16 nodes show method presents good scale-out characteristics can indeed vastly improve speeds while remaining competitive in terms performance. Specifically, our load dataset 1.1 billion triples at rate 2.48 million per second provide RDF-3X 4store expensive queries.

参考文章(49)
Nick Gibbins, mc schraefel, Alisdair Owens, Andy Seaborne, Clustered TDB: A Clustered Triple Store for Jena s.n.. ,(2008)
Laurens Rietveld, Rinke Hoekstra, Stefan Schlobach, Christophe Guéret, Structural Properties as Proxy for Semantic Relevance in RDF Graph Sampling The Semantic Web – ISWC 2014. ,vol. 8797, pp. 81- 96 ,(2014) , 10.1007/978-3-319-11915-1_6
Long Cheng, Spyros Kotoulas, Tomas E Ward, Georgios Theodoropoulos, Robust and Efficient Large-Large Table Outer Joins on Distributed Infrastructures Lecture Notes in Computer Science. pp. 258- 269 ,(2014) , 10.1007/978-3-319-09873-9_22
Jürgen Umbrich, Marcel Karnstedt, Aidan Hogan, Josiane Xavier Parreira, Hybrid SPARQL queries: fresh vs. fast results international semantic web conference. pp. 608- 624 ,(2012) , 10.1007/978-3-642-35176-1_38
José M. Giménez-García, Javier D. Fernández, Miguel A. Martínez-Prieto, HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce european semantic web conference. pp. 253- 268 ,(2015) , 10.1007/978-3-319-18818-8_16
Hamid R. Bazoobandi, Steven de Rooij, Jacopo Urbani, Annette ten Teije, Frank van Harmelen, Henri Bal, A Compact In-Memory Dictionary for RDF Data european semantic web conference. pp. 205- 220 ,(2015) , 10.1007/978-3-319-18818-8_13
Orri Erling, Ivan Mikhailov, Virtuoso: RDF Support in a Native RDBMS swim. pp. 501- 519 ,(2010) , 10.1007/978-3-642-04329-1_21
Brian McBride, Jena: implementing the RDF model and syntax specification international semantic web conference. pp. 23- 28 ,(2001)
Eric L. Goodman, Edward Jimenez, David Mizell, Sinan al-Saffar, Bob Adolf, David Haglin, High-Performance Computing Applied to Semantic Databases The Semanic Web: Research and Applications. pp. 31- 45 ,(2011) , 10.1007/978-3-642-21064-8_3
Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez, Compact representation of large RDF data sets for publishing and exchange international semantic web conference. pp. 193- 208 ,(2010) , 10.1007/978-3-642-17746-0_13