作者: Allen D. Malony , Srinivasan Ramesh , Kevin Huck , Chad Wood , Sameer Shendey
DOI: 10.1109/HPCS48598.2019.9188097
关键词: Software engineering 、 Scalable computing 、 Computer science 、 Analytics 、 Measure (data warehouse) 、 Performance measurement 、 Work (electrical)
摘要: Developers of scientific simulations use parallel performance systems to measure, analyze, and tune their applications on large-scale HPC machines. In the majority these systems, analysis takes place offline. More consequentially, if runtime analytics are desired, measurement infrastructures need be designed implemented in such a way make it possible. We investigate question how create capabilities by considering this objective reference platform – TAU Performance System. Our research work identifies general issues concern describes can addressed new TAUbased framework. Several case studies proposed as different examples. These prototyped, evaluated machines, discussed. The outcomes study suggest that has merit. Furthermore, we believe approach could directly carry forward other systems.