A View from ORNL: Scientific Data Research Opportunities in the Big Data Age

作者: Scott Klasky , Matthew Wolf , Mark Ainsworth , Chuck Atkins , Jong Choi

DOI: 10.1109/ICDCS.2018.00136

关键词: Data scienceDevelopment planScalabilitySoftwareSPARK (programming language)Big dataComputer scienceWorkflowData visualizationData modeling

摘要: One of the core issues across computer and computational science today is adapting to, managing, learning from influx "Big Data". In commercial space, this problem has led to a huge investment in new technologies capabilities that are well adapted dealing with sorts human-generated logs, videos, texts, other large-data artifacts processed resulted an explosion useful platforms languages (Hadoop, Spark, Pandas, etc.). However, translating work enterprise space HPC community proven somewhat difficult, part because some fundamental differences type scale data timescales surrounding its generation use. We describe forward-looking research development plan which centers around concept making Input/Output (I/O) intelligent for users scientific community, whether they accessing scalable storage or performing situ workflow tasks. Much our based on experience Adaptable I/O System (ADIOS 1.X), next version software ADIOS 2.X [1].

参考文章(43)
Ilkay Altintas, Chad Berkley, Edward A. Lee, Efrat Jaeger, Bertram Ludäscher, Matthew Jones, Jing Tao, Yang Zhao, Dan Higgins, Scientific workflow management and the Kepler system: Research Articles Concurrency and Computation: Practice and Experience. ,vol. 18, pp. 1039- 1065 ,(2006) , 10.1002/CPE.V18:10
Jai Dayal, Jay Lofstead, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Hasan Abbasi, Scott Klasky, SODA: Science-Driven Orchestration of Data Analytics 2015 IEEE 11th International Conference on e-Science. pp. 475- 484 ,(2015) , 10.1109/ESCIENCE.2015.59
Qing Liu, Jeremy Logan, Yuan Tian, Hasan Abbasi, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Roselyne Tchoua, Jay Lofstead, Ron Oldfield, Manish Parashar, Nagiza Samatova, Karsten Schwan, Arie Shoshani, Matthew Wolf, Kesheng Wu, Weikuan Yu, Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks Concurrency and Computation: Practice and Experience. ,vol. 26, pp. 1453- 1473 ,(2014) , 10.1002/CPE.3125
Lipeng Wan, Zheng Lu, Qing Cao, Feiyi Wang, Sarp Oral, Bradley Settlemyer, SSD-optimized workload placement with adaptive learning and classification in HPC environments ieee conference on mass storage systems and technologies. pp. 1- 6 ,(2014) , 10.1109/MSST.2014.6855552
Torsten Hoefler, Marc Snir, Generic topology mapping strategies for large-scale parallel architectures Proceedings of the international conference on Supercomputing - ICS '11. pp. 75- 84 ,(2011) , 10.1145/1995896.1995909
C. S. Chang, S. Ku, P. H. Diamond, Z. Lin, S. Parker, T. S. Hahm, N. Samatova, Compressed ion temperature gradient turbulence in diverted tokamak edge Physics of Plasmas. ,vol. 16, pp. 056108- 056108 ,(2009) , 10.1063/1.3099329
W. Dorland, F. Jenko, M. Kotschenreuther, B. N. Rogers, Electron temperature gradient turbulence. Physical Review Letters. ,vol. 85, pp. 5579- 5582 ,(2000) , 10.1103/PHYSREVLETT.85.5579
Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, Samuel Lang, Katherine Riley, None, 24/7 Characterization of petascale I/O workloads international conference on cluster computing. pp. 1- 10 ,(2009) , 10.1109/CLUSTR.2009.5289150
Fang Zheng, Hongfeng Yu, Can Hantas, Matthew Wolf, Greg Eisenhauer, Karsten Schwan, Hasan Abbasi, Scott Klasky, GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution ieee international conference on high performance computing data and analytics. pp. 78- ,(2013) , 10.1145/2503210.2503279