Optimizing multiple queries on scientific datasets with partial replicas

作者: Li Weng , Umit Catalyurek , Tahsin Kurc , Gagan Agrawal , Joel Saltz

DOI: 10.1109/GRID.2007.4354141

关键词:

摘要: We propose strategies to efficiently execute a query workload, which consists of multiple related queries submitted against scientific dataset, on distributed-memory system in the presence partial dataset replicas. Partial replication re-organizes and re-distributes one or more subsets across storage reduce I/O overheads increase parallelism. Our work targets class queries, called range predicate specifies lower upper bounds values all subset attributes dataset. Data elements whose attribute fall into specified are retrieved from If we think forming multi-dimensional space, where each corresponds dimensions, defines bounding box this multidimensional space. evaluate our two scenarios involving queries. The first scenario represents case have overlapping regions interest, such as those arising an exploratory analysis by users. In second scenario, represent adjacent rectilinear sections that capture irregular subregion This user wants retrieve spatial feature cost models algorithm for optimizing results using subsetting medical image datasets show effective use replicas can result reduction execution times.

参考文章(15)
Alan Sussman, Eugene Borovikov, Tahsin Kurc, Joel Saltz, Henrique Andrade, Servicing Mixed Data Intensive Query Workloads ,(2002)
Miron Livny, Michael J. Carey, Michael J. Franklin, Global Memory Management in Client-Server Database Architectures very large data bases. pp. 596- 609 ,(1992)
Kavitha Ranganathan, Ian Foster, Identifying Dynamic Replication Strategies for a High-Performance Data Grid grid computing. pp. 75- 86 ,(2001) , 10.1007/3-540-45644-9_8
M. Nedim Alpdemir, Arijit Mukherjee, Norman W. Paton, Paul Watson, Alvaro A. A. Fernandes, Anastasios Gounaris, Jim Smith, Service-Based Distributed Querying on the Grid international conference on service oriented computing. pp. 467- 482 ,(2003) , 10.1007/978-3-540-24593-3_32
Bill Allcock, Joe Bester, John Bresnahan, Ann Chervenak, Carl Kesselman, Sam Meder, Veronika Nefedova, Darcy Quesnel, Steven Tuecke, Ian Foster, Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing ieee conference on mass storage systems and technologies. pp. 13- 13 ,(2001) , 10.1109/MSS.2001.10001
Prasad M. Deshpande, Karthikeyan Ramasamy, Amit Shukla, Jeffrey F. Naughton, Caching multidimensional queries using chunks Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 259- 270 ,(1998) , 10.1145/276304.276328
Timos K. Sellis, Multiple-query optimization ACM Transactions on Database Systems. ,vol. 13, pp. 23- 52 ,(1988) , 10.1145/42201.42203
Michael D Dahlin, Randolph Y Wang, Thomas E Anderson, David A Patterson, None, Cooperative caching: using remote client memory to improve file system performance operating systems design and implementation. pp. 19- ,(1994) , 10.5555/1267638.1267657
Sivaramakrishnan Narayanan, U. Catalyurek, T. Kurc, X. Zhang, J. Saltz, Applying database support for large scale data driven science in distributed environments latin american web congress. pp. 141- 148 ,(2003) , 10.1109/GRID.2003.1261709
L. Weng, U. Catalyurek, T. Kurc, Gagan Agrawal, J. Saltz, Servicing range queries on multidimensional datasets with partial replicas cluster computing and the grid. ,vol. 2, pp. 726- 733 ,(2005) , 10.1109/CCGRID.2005.1558635