作者: Eric Brewer , Emre Kiciman , Mike Y. Chen , Armando Fox , Anthony Accardi
DOI:
关键词:
摘要: We present a new approach to managing failures and evolution in large, complex distributed systems using runtime paths. use the paths that requests follow as they move through system our core abstraction, "macro" focuses on component interactions rather than details of components themselves. Paths record performance interactions, are user- request-centric, occur sufficient volume enable statistical analysis, all way is easily reusable across applications. Automated analysis multiple allows for detection diagnosis assessment issues. In particular, enables significantly stronger capabilities failure detection, diagnosis, impact understanding evolution. explore these with three real implementations, two which service millions per day. Our contributions include approach; maintainable, extensible, architecture; various engines; discussion experience high-volume production over several years.