作者: Daniel L. Wang , Charles S. Zender , Stephen F. Jenks
关键词:
摘要: Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more produced than can be practically analyzed. Whole-dataset download costs grown impractical heights, even with multi-Gbps networks, forcing scientists rely on server-side subsetting and limiting the scope of they analyze a workstation. Our system supplements existing scientific services lightweight capability, providing means safely relocating analysis from desktop server where clustered execution coordinated, exploiting locality, reducing unnecessary transfer, end-users results several times faster. We show how dataflow other compiler-inspired analyses shell scripts scientists' most common tools enables parallelization optimizations disk network I/O bandwidth. benchmark using an actual geo-science script, illustrating crucial performance gains extracting workflows defined optimizing their execution. Current quantify significant improvements performance, showing promise bringing transparent high-performance scientist's desktop.