Distributed Diagnosis of Failures in a Three Tier E-Commerce System

作者: Gunjan Khanna , Ignacio Laguna , Fahad A. Arshad , Saurabh Bagchi

DOI: 10.1109/SRDS.2007.16

关键词:

摘要: For dependability outages in distributed Internet infrastructures, it is often not enough to detect a failure, but also required diagnose it, i.e., identify its source. Complex applications deployed multi-tier environments make diagnosis challenging because of fast error propagation, black-box applications, high delay, the amount states that can be maintained, and imperfect diagnostic tests. Here, we propose probabilistic model for arbitrary failures components application. The monitoring system (the Monitor) passively observes message exchanges between and, at runtime, performs component was root cause failure. We demonstrate approach by applying Pet Store J2EE application, compare with Pinpoint quantifying latency accuracy both systems. Monitor outperforms achieving comparably accurate higher precision shorter time.

参考文章(47)
Srikanth Kandula, Yuanyuan Zhou, Sudarshan M. Srinivasan, Christopher R. Andrews, Flashback: a lightweight extension for rollback and deterministic replay for software debugging usenix annual technical conference. pp. 3- 3 ,(2004)
Spiros Xanthos, Shan Lu, Yuanyuan Zhou, Joseph Tucek, Chengdu Huang, Automatic on-line failure diagnosis at the end-user site hot topics in system dependability. pp. 4- 4 ,(2006)
Richard Mortier, Rebecca Isaacs, Dushyanth Narayanan, Paul Barham, Magpie: online modelling and performance-aware systems hot topics in operating systems. pp. 15- 15 ,(2003)
Douglas S. Reeves, Xinyuan Wang, S. Felix Wu, Tracing Based Active Intrusion Response ,(2002)
George W. Dunlap, Peter M. Chen, Samuel T. King, Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!). usenix annual technical conference. pp. 1- 15 ,(2005)
Hervé Debar, Andreas Wespi, Aggregation and Correlation of Intrusion-Detection Alerts recent advances in intrusion detection. pp. 85- 103 ,(2001) , 10.1007/3-540-45474-8_6
George W. Dunlap, Peter M. Chen, Samuel T. King, Debugging operating systems with time-traveling virtual machines usenix annual technical conference. pp. 1- 1 ,(2005)
S. Kliger, S. Yemini, Y. Yemini, D. Ohsie, S. Stolfo, A coding approach to event correlation integrated network management. pp. 266- 277 ,(1995) , 10.1007/978-0-387-34890-2_24
D. Fussell, S. Rangarajan, Probabilistic diagnosis of multiprocessor systems with arbitrary connectivity ieee international symposium on fault tolerant computing. pp. 560- 565 ,(1989) , 10.1109/FTCS.1989.105636
P. Peti, R. Obermaisser, H. Kopetz, Out-of-norm assertions [diagnostic mechanism] real time technology and applications symposium. pp. 280- 291 ,(2005) , 10.1109/RTAS.2005.38