作者: Li Weng , Umit Catalyurek , Tahsin Kurc , Gagan Agrawal , Joel Saltz
DOI: 10.1109/GRID.2007.4354141
关键词:
摘要: We propose strategies to efficiently execute a query workload, which consists of multiple related queries submitted against scientific dataset, on distributed-memory system in the presence partial dataset replicas. Partial replication re-organizes and re-distributes one or more subsets across storage reduce I/O overheads increase parallelism. Our work targets class queries, called range predicate specifies lower upper bounds values all subset attributes dataset. Data elements whose attribute fall into specified are retrieved from If we think forming multi-dimensional space, where each corresponds dimensions, defines bounding box this multidimensional space. evaluate our two scenarios involving queries. The first scenario represents case have overlapping regions interest, such as those arising an exploratory analysis by users. In second scenario, represent adjacent rectilinear sections that capture irregular subregion This user wants retrieve spatial feature cost models algorithm for optimizing results using subsetting medical image datasets show effective use replicas can result reduction execution times.