Classification of Post-deployment Performance Diagnostic Techniques for Large-scale Software Systems☆

作者: Haroon Malik , Elhadi M. Shakshuki

DOI: 10.1016/J.PROCS.2014.08.036

关键词:

摘要: Abstract Today's large-scale software systems (LSSs) such as Facebook, Google, Amazon and many other contemporary datacenters comprise hundreds or thousands of machines running complex applications that require high availability responsiveness. These LSSs must be carefully monitored for performance bottlenecks before a serious harm is done. Performance analysts have to deal with the tedious task monitoring these avoid any service level agreements (SLA) violations ensure their failure free operations. There do exist several post-deployment diagnostic (PPD) techniques help diagnose problems in field, i.e., after deployed. However, there no classification proposed PPD understand objectives characteristics. In this paper, we classify existing along multiple categories. The will provide guideline practitioners LSS choose suitable need. Moreover, also researcher fill gaps, dedicate research efforts categories received little attention past.

参考文章(43)
C. Roblee, V. Berk, G. Cybenko, Implementing Large-Scale Autonomic Server Monitoring Using Process Query Systems Second International Conference on Autonomic Computing (ICAC'05). pp. 123- 133 ,(2005) , 10.1109/ICAC.2005.34
Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos, Streaming pattern discovery in multiple time-series very large data bases. pp. 697- 708 ,(2005) , 10.1184/R1/6609941.V1
Zhen Guo, Guofei Jiang, Haifeng Chen, K. Yoshihira, Tracking Probabilistic Correlation of Monitoring Data for Fault Detection in Complex Systems dependable systems and networks. pp. 259- 268 ,(2006) , 10.1109/DSN.2006.70
Zhen Ming Jiang, Ahmed E. Hassan, Gilbert Hamann, Parminder Flora, Automatic identification of load testing problems international conference on software maintenance. pp. 307- 316 ,(2008) , 10.1109/ICSM.2008.4658079
David A. Patterson, Armando Fox, Ling Huang, Michael Jordan, Wei Xu, Large-Scale System Problems Detection by Mining Console Logs ,(2009)
Leo Breiman, Random Forests Machine Learning archive. ,vol. 45, pp. 5- 32 ,(2001) , 10.1023/A:1010933404324
Peter J. Rousseeuw, Leonard Kaufman, Finding Groups in Data: An Introduction to Cluster Analysis ,(1990)
Soila Pertet, Rajeev Gandhi, Priya Narasimhan, None, Fingerpointing correlated failures in replicated systems usenix workshop on tackling computer systems problems with machine learning techniques. pp. 9- ,(2007)
David A Dillow, Al Geist, Galen M Shipman, Jason J Hill, Don E Maxwell, Byung H Park, Raghul Gunasekaran, Correlating Log Messages for System Diagnostics ,(2010)
Wei Xu, Ling Huang, Armando Fox, David A Patterson, Michael I Jordan, None, Mining console logs for large-scale system problem detection usenix workshop on tackling computer systems problems with machine learning techniques. pp. 4- 4 ,(2008)