作者: Tao Zhang , XiangZheng Sun , Wei Xue , Nan Qiao , Huang Huang
DOI: 10.1016/J.FUTURE.2014.10.015
关键词: Visualization 、 Scheduling (computing) 、 Distributed computing 、 NetCDF 、 Computer science 、 Scalability 、 Distributed File System 、 Parallel computing 、 Data-intensive computing
摘要: Abstract Scientific data analysis and visualization have become the key component for nowadays large scale simulations. Due to rapidly increasing volume awkward I/O pattern among high structured files, known serial methods/tools cannot well usually lead poor performance over traditional architectures. In this paper, we propose a new framework: ParSA (parallel scientific analysis) high-throughput scalable analysis, with distributed file system. presents optimization strategies grouping splitting logical units utilize property of system, scheduling distribution block replicas reduce network reading, as maximize overlapping processing, transferring during computation. Besides, provides similar interfaces NetCDF Operator (NCO), which is used in most climate diagnostic packages, making it easy use framework. We accelerate well-known methods models on Hadoop Distributed File System (HDFS). Experimental results demonstrate efficiency scalability ParSA, getting maximum 1.3 GB/s throughput six nodes cluster five disks per node. Yet, can only get 392 MB/s RAID-6 storage