Clustered Workflow Execution of Retargeted Data Analysis Scripts

作者: Daniel L. Wang , Charles S. Zender , Stephen F. Jenks

DOI: 10.1109/CCGRID.2008.69

关键词:

摘要: Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more produced than can be practically analyzed. Whole-dataset download costs grown impractical heights, even with multi-Gbps networks, forcing scientists rely on server-side subsetting and limiting the scope of they analyze a workstation. Our system supplements existing scientific services lightweight capability, providing means safely relocating analysis from desktop server where clustered execution coordinated, exploiting locality, reducing unnecessary transfer, end-users results several times faster. We show how dataflow other compiler-inspired analyses shell scripts scientists' most common tools enables parallelization optimizations disk network I/O bandwidth. benchmark using an actual geo-science script, illustrating crucial performance gains extracting workflows defined optimizing their execution. Current quantify significant improvements performance, showing promise bringing transparent high-performance scientist's desktop.

参考文章(19)
Pinar Senkul, Michael Kifer, Ismail H Toroslu, None, A logical framework for scheduling workflows under resource allocation constraints very large data bases. pp. 694- 705 ,(2002) , 10.1016/B978-155860869-6/50067-6
Duane C. Hanselman, Bruce L. Littlefield, Mastering MATLAB 7 ,(2004)
Don Box, David Ehnebuske, Gopal Kakivaya, Andrew Layman, Noah Mendelsohn, Henrik Frystyk Nielsen, Satish Thatte, Dave Winer, Simple object access protocol (SOAP) 1.1 W3C Note. ,(2000)
Daniel L. Wang, Charles S. Zender, Stephen F. Jenks, Server-side parallel data reduction and analysis grid and pervasive computing. pp. 744- 750 ,(2007) , 10.1007/978-3-540-72360-8_67
J. Darlington, M. Ghanem, H.W. To, Structured parallel programming Proceedings of Workshop on Programming Models for Massively Parallel Computers. pp. 160- 169 ,(1993) , 10.1109/PMMP.1993.315543
Charles S. Zender, Short communication: Analysis of self-describing gridded geoscience data with netCDF Operators (NCO) Environmental Modelling and Software. ,vol. 23, pp. 1338- 1342 ,(2008) , 10.1016/J.ENVSOFT.2008.03.004
Luca Clementi, Sriram Krishnan, Wesley Goodman, Jingyuan Ren, Wilfred W. Li, Peter W. Arzberger, Guillaume Vareille, Sargis Dallakyan, Michel F. Sanner, Services Oriented Architecture for Managing Workflows of Avian Flu Grid ieee international conference on escience. pp. 582- 589 ,(2008) , 10.1109/ESCIENCE.2008.37
Matt Mathis, John Heffner, Raghu Reddy, Web100 ACM SIGCOMM Computer Communication Review. ,vol. 33, pp. 69- 79 ,(2003) , 10.1145/956993.957002
Ian Taylor, Ian Wang, Matthew Shields, Shalil Majithia, Distributed computing with Triana on the Grid Concurrency and Computation: Practice and Experience. ,vol. 17, pp. 1197- 1214 ,(2005) , 10.1002/CPE.901
I. Altintas, C. Berkley, E. Jaeger, B. Ludascher, M. Jones, S. Mock, Kepler: an extensible system for design and execution of scientific workflows statistical and scientific database management. ,vol. 16, pp. 423- 424 ,(2004) , 10.1109/SSDBM.2004.44