作者: Chialin Chang , Anurag Acharya , Alan Sussman , Joel Saltz
关键词: Modular design 、 Computer science 、 Computer data storage 、 Data retrieval 、 Visualization 、 Parallel database 、 Grid 、 Spatial reference system 、 Data mining
摘要: As computational power and storage capacity increase, processing analyzing large volumes of data play an increasingly important part in many domains scientific research. Typical examples datasets include long running simulations time-dependent phenomena that periodically generate snapshots their state (e.g. hydrodynamics chemical transport simulation for estimating pollution impact on water bodies [4, 6, 20], magnetohydrodynamics planetary magnetospheres [32], a flame sweeping through volume [28], airplane wake [21]), archives raw processed remote sensing AVHRR [25], Thematic Mapper [17], MODIS [22]), medical images confocal light microscopy, CT imaging, MRI, sonography).These are usually multi-dimensional. The dimensions can be spatial coordinates, time, or experimental conditions such as temperature, velocity magnetic field. importance has been recognized by several database research groups vendors, systems have developed managing and/or visualizing them [2, 7, 14, 19, 26, 27, 29, 31].These systems, however, focus lineage management, retrieval visualization multi-dimensional datasets. They provide little no support these -- the assumption is this too application-specific to warrant common support. result, applications process decoupled from resulting inefficiency due copying loss locality. Furthermore, every application developer implement complex scheduling processing.Over past three years, we working with understand requirements [1, 5, 10, 18, 23, 24, 28]. Our study set indicates often highly stylized shares characteristics. Usually, both input dataset well result being computed underlying grids, queries into form ranges within each dimension grid. basic step consists transforming individual items, mapping transformed items output grid computing aggregating, some way, all mapped corresponding point. For example, remote-sensing earth generated performing atmospheric correction days worth telemetry data, latitude-longitude selecting those measurements clearest view.In paper, present T2, customizable parallel integrates storage, T2 provides operations including index generation, retrieval, memory across machine user interaction. It achieves its primary advantage ability seamlessly integrate wide variety maintain multiple different grids. Most other focused uniformly distributed datasets, images, maps, dense arrays. Many real non-uniform unstructured. satellite two dimensional strip embedded space; contamination studies use unstructured meshes selectively simulate regions so on. handle uniform datasets.T2 modular services. Since structure mirrors applications, easy customize types processing. To build version customized particular application, functions pre-process map elements aggregate same element.T2 presents interface end users (the clients system). Users specify dataset(s) interest, region interest dataset(s), desired format resolution output. In addition, they select aggregation used. analyzes request, builds suitable plan retrieve executes results format.In Section 2 first motivating illustrate structure. 3 then overview distinguishing features example. 4 describes service detail. An example how services given 5. system evolution. We conclude 6 description current status design implementation various T2.