作者: Gunjan Khanna , Ignacio Laguna , Fahad A. Arshad , Saurabh Bagchi
DOI: 10.1109/SRDS.2007.16
关键词:
摘要: For dependability outages in distributed Internet infrastructures, it is often not enough to detect a failure, but also required diagnose it, i.e., identify its source. Complex applications deployed multi-tier environments make diagnosis challenging because of fast error propagation, black-box applications, high delay, the amount states that can be maintained, and imperfect diagnostic tests. Here, we propose probabilistic model for arbitrary failures components application. The monitoring system (the Monitor) passively observes message exchanges between and, at runtime, performs component was root cause failure. We demonstrate approach by applying Pet Store J2EE application, compare with Pinpoint quantifying latency accuracy both systems. Monitor outperforms achieving comparably accurate higher precision shorter time.