作者: Haroon Malik , Elhadi M. Shakshuki
DOI: 10.1016/J.PROCS.2014.08.036
关键词:
摘要: Abstract Today's large-scale software systems (LSSs) such as Facebook, Google, Amazon and many other contemporary datacenters comprise hundreds or thousands of machines running complex applications that require high availability responsiveness. These LSSs must be carefully monitored for performance bottlenecks before a serious harm is done. Performance analysts have to deal with the tedious task monitoring these avoid any service level agreements (SLA) violations ensure their failure free operations. There do exist several post-deployment diagnostic (PPD) techniques help diagnose problems in field, i.e., after deployed. However, there no classification proposed PPD understand objectives characteristics. In this paper, we classify existing along multiple categories. The will provide guideline practitioners LSS choose suitable need. Moreover, also researcher fill gaps, dedicate research efforts categories received little attention past.