作者: HyeongSik Kim , Padmashree Ravindra , Kemafor Anyanwu
关键词: Data modeling 、 Cost effectiveness 、 Computer science 、 SPARQL 、 RDF 、 Cloud computing 、 Database 、 Analytics 、 Data model 、 Semantic Web 、 Workflow
摘要: Scalable processing of Semantic Web queries has become a critical need given the rapid upward trend in availability data. The MapReduce paradigm is emerging as platform choice for large scale data and analytics due to its ease use, cost effectiveness, potential unlimited scaling. Processing on triple models challenge mainstream called Apache Hadoop, extensions such Pig Hive. This because require numerous joins which leads lengthy expensive workflows. Further, this paradigm, cloud resources are acquired demand traditional join optimization machinery statistics indexes often absent or not easily supported.In demonstration, we will present RAPID+, an extended system that uses algebraic approach optimizing RDF including involving inferencing. basic idea by using logical physical operators more natural processing, can reinterpret way concise execution workflows small intermediate footprints minimize disk I/Os network transfer overhead. RAPID+ evaluates Nested TripleGroup Data Model Algebra(NTGA). demo show comparative performance NTGA query plans vs. relational algebra-like used