作者: Barton P. Miller , Alexander V. Mirgorodskiy
关键词:
摘要: We present a three-part approach for diagnosing bugs and performance problems in production distributed environments. First, we introduce novel execution monitoring technique that dynamically injects fragment of code, the agent, into an application process on demand. The agent inserts instrumentation ahead control flow within propagates other processes, following communication events, crossing host boundaries, collecting function-level trace execution. Second, algorithm separates user-meaningful activities called flows. This step simplifies manual examination enables automated analysis trace. Finally, describe our root cause compares flows to help analyst locate anomalous identify function is likely anomaly. demonstrate effectiveness techniques by two complex Condor scheduling system.