作者: Jeffrey P. Gardner , Andrew Connolly , Cameron McBride
DOI:
关键词: Data science 、 Distributed computing 、 Ntropy 、 Massively parallel 、 Computer science 、 TeraGrid 、 Scalability 、 Knowledge extraction 、 Tree (data structure) 、 Task (computing) 、 Range (computer programming)
摘要: Abstract S Virtual observatories will give astronomers easy access to an unprecedented amount of data. Extracting scientic knowl-edge from these data increasingly demand both efcient algorithms as well the power parallel computers. Such machineswill range in size small Beowulf clusters large massively platforms (MPPs) collections MPPs distributed across aGrid, such NSF TeraGrid facility. Nearly all analyses astronomical datasets use trees their fundamental datastructure. Writing tree-based techniques, a task that is time-consuming even on single-processor computers, exceedinglycumbersome or grid-distributed resources. We have developed library, Ntropy, provides e xible, extensible, andeasy-to-use way developing analysis for serial and platforms. Our experience has shownthat not only does our library save development time, it also delivers increase performance. Furthermore, Ntropy makes iteasy astronomer with little no programming quickly scale application multiproces-sor environment. By minimizing time scalable analysis, we enable wide-scale knowledge discoveryon massive datasets.