Prescriptive provenance for streaming analysis of workflows at scale

作者: Line Pouchard , Kevin Huck , Gyorgy Matyasfalvi , Dingwen Tao , Li Tang

DOI: 10.1109/NYSDS.2018.8538951

关键词:

摘要: We extend our approach capturing and relating the provenance performance metrics of computational workflows as a diagnostic tool for runtime optimization placement. One important challenge is volume extracted data, both provenance, even when specifying filters focusing on quantities interest in simulation. reduce this data by performing anomaly detection streaming store detected anomalies, an we call prescriptive provenance. This paper discusses Chimbuko architecture enabling approach. present use protein structure propagation workflow based NWChemEx. are testing algorithms preliminary results here obtained with Local Outlier Factor. While scaling remains challenge, these show that robust analysis promising

参考文章(11)
Qing Liu, Jeremy Logan, Yuan Tian, Hasan Abbasi, Norbert Podhorszki, Jong Youl Choi, Scott Klasky, Roselyne Tchoua, Jay Lofstead, Ron Oldfield, Manish Parashar, Nagiza Samatova, Karsten Schwan, Arie Shoshani, Matthew Wolf, Kesheng Wu, Weikuan Yu, Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks Concurrency and Computation: Practice and Experience. ,vol. 26, pp. 1453- 1473 ,(2014) , 10.1002/CPE.3125
M. Valiev, E.J. Bylaska, N. Govind, K. Kowalski, T.P. Straatsma, H.J.J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T.L. Windus, W.A. de Jong, NWChem: a comprehensive and scalable open-source solution for large scale molecular simulations Computer Physics Communications. ,vol. 181, pp. 1477- 1489 ,(2010) , 10.1016/J.CPC.2010.04.018
Hao Huang, Hong Qin, Shinjae Yoo, Dantong Yu, Physics-Based Anomaly Detection Defined on Manifold Space ACM Transactions on Knowledge Discovery From Data. ,vol. 9, pp. 14- ,(2014) , 10.1145/2641574
Y. Chang, R. Bruni, B. Kloss, Z. Assur, E. Kloppmann, B. Rost, W. A. Hendrickson, Q. Liu, Structural basis for a pH-sensitive calcium leak across membranes Science. ,vol. 344, pp. 1131- 1135 ,(2014) , 10.1126/SCIENCE.1252043
Wendy D. Cornell, Piotr Cieplak, Christopher I. Bayly, Ian R. Gould, Kenneth M. Merz, David M. Ferguson, David C. Spellmeyer, Thomas Fox, James W. Caldwell, Peter A. Kollman, A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules Journal of the American Chemical Society. ,vol. 117, pp. 5179- 5197 ,(1995) , 10.1021/JA00124A002
Sameer S. Shende, Allen D. Malony, The Tau Parallel Performance System ieee international conference on high performance computing data and analytics. ,vol. 20, pp. 287- 311 ,(2006) , 10.1177/1094342006064482
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander, LOF: identifying density-based local outliers international conference on management of data. ,vol. 29, pp. 93- 104 ,(2000) , 10.1145/335191.335388
Jianwu Wang, Daniel Crawl, Shweta Purawat, Mai Nguyen, Ilkay Altintas, Big data provenance: Challenges, state of the art and opportunities 2015 IEEE International Conference on Big Data (Big Data). ,vol. 2015, pp. 2509- 2516 ,(2015) , 10.1109/BIGDATA.2015.7364047
Alexey K. Shaytan, Grigoriy A. Armeev, Alexander Goncearenco, Victor B. Zhurkin, David Landsman, Anna R. Panchenko, Coupling between Histone Conformations and DNA Geometry in Nucleosomes on a Microsecond Timescale: Atomistic Insights into Nucleosome Functions Journal of Molecular Biology. ,vol. 428, pp. 221- 237 ,(2016) , 10.1016/J.JMB.2015.12.004
Kevin Huck, Allen Malony, Todd Gamblin, Alfredo Gimenez, Daniel Ellsworth, Chad Wood, Sudhanshu Sane, A scalable observation system for introspection and in situ analytics Proceedings of the 5th Workshop on Extreme-Scale Programming Tools. pp. 42- 49 ,(2016) , 10.5555/3018823.3018829