作者: Long Cheng , Spyros Kotoulas
DOI: 10.1109/TBDATA.2015.2505719
关键词: Big data 、 Data mining 、 Computer science 、 RDF 、 Dynamic data 、 Distributed database 、 SPARQL 、 Cwm 、 RDF query language 、 Scalability
摘要: Distributed RDF data management systems become increasingly important with the growth of Semantic Web. Regardless, current methods meet performance bottlenecks either on loading or querying when processing large amounts data. In this work, we propose efficient for using dynamic re-partitioning to enable rapid analysis datasets. Our approach adopts a two-tier index architecture each computation node: (1) lightweight primary index, keep times low, and (2) series dynamic, multi-level secondary indexes, calculated as by-product query execution, decrease remove inter-machine movement subsequent queries that contain same graph patterns. addition, replace some indexes distributed filters, so memory consumption. Experimental results commodity cluster 16 nodes show method presents good scale-out characteristics can indeed vastly improve speeds while remaining competitive in terms performance. Specifically, our load dataset 1.1 billion triples at rate 2.48 million per second provide RDF-3X 4store expensive queries.