Using queries for distributed monitoring and forensics

作者: Atul Singh , Petros Maniatis , Timothy Roscoe , Peter Druschel

DOI: 10.1145/1217935.1217973

关键词:

摘要: Distributed systems are hard to build, profile, debug, and test. Monitoring a distributed system - detect analyze bugs, test for regressions, identify fault-tolerance problems or security compromises can be difficult error-prone. In this paper we argue that declarative development of is well suited tackle these tasks. We present an application logging, monitoring, debugging facility have built on top the P2 system, comprising introspection model, execution tracing component, query processor. use demonstrate range on-line diagnosis tools from simple, local state assertions sophisticated global property detectors consistent snapshots. These small, deployed piecemeal at any point during system's life cycle. Our evaluation suggests overhead our approach improving monitoring running continuously in tune with its benefits.

参考文章(27)
J C Wileden, V R Lesser, P C Bates, A Debugging Tool for Distributed Systems ,(1983)
Steven D. Gribble, Andrew Whitaker, Richard S. Cox, Configuration debugging as search: finding the needle in the haystack operating systems design and implementation. pp. 6- 6 ,(2004)
Richard Mortier, Rebecca Isaacs, Austin Donnelly, Paul Barham, Using magpie for request extraction and workload modelling operating systems design and implementation. pp. 18- 18 ,(2004)
Mariano P. Consens, Alberto O. Mendelzon, Masum Z. Hasan, Using Hy + for network management and distributed debugging conference of the centre for advanced studies on collaborative research. pp. 450- 471 ,(1993)
John C. Platt, Helen J. Wang, Yi-Min Wang, Ruyun Zhang, Yu Chen, Automatic misconfiguration troubleshooting with peerpressure operating systems design and implementation. pp. 17- 17 ,(2004)
Eric Brewer, Emre Kiciman, Mike Y. Chen, Armando Fox, Anthony Accardi, Jim Lloyd, Dave Patterson, Path-based faliure and evolution management networked systems design and implementation. pp. 23- 23 ,(2004)
Anupam Chanda, Alan L. Cox, Khaled Elmeleegy, Willy Zwaenepoel, Causeway: operating system support for controlling and analyzing the execution of distributed programs hot topics in operating systems. pp. 18- 18 ,(2005)
Emre Kiciman, Lakshminarayanan Subramanian, A Root Cause Localization Model for Large Scale Systems international conference on computer communications. ,(2005)
Scott Shenker, Dennis Geels, Gautam Altekar, Ion Stoica, Replay debugging for distributed applications usenix annual technical conference. pp. 27- 27 ,(2006)
An-Cheng Huang, P. Steenkiste, Building self-adapting services using service-specific knowledge high performance distributed computing. pp. 34- 43 ,(2005) , 10.1109/HPDC.2005.1520931