A scalable self-diagnosing content distribution service with bounded latency

作者: Tarek F. Abdelzaher , Chengdu Huang

DOI:

关键词:

摘要: Providing contractual performance assurances in distributed systems is an important and challenging problem. From the users' perspective, stringent timing requirements are becoming more critical. Meanwhile, from system engineers' driven towards increasingly larger scale, integration higher complexity, making predictable difficult. In this dissertation, we present design, implementation, evaluation of a scalable self-diagnosing content distribution service that provides global bounded latencies on access. Our solution firstly involves decentralized replication scheme dynamically selects subsets servers wide-area networks for different classes so per-class network latency bounds achieved. The decisions made autonomously by based measured workload conditions. proceeds way balances among servers, hence fully utilizing capacity avoiding bound violations. efficiency nature enables our to scale up very large networks. The capability comes learning-based problem diagnosis techniques propose. increasing complexity has motivated design machine learning approaches automate some management tasks. However, with increase current suffer serious scalability issues. We two automatically identify probable causes problems server multiple tiers replicated sites. By incorporating number diagnostic information sources using temporal segmentation mechanism applying transfer techniques, achieve both improved accuracy.

参考文章(75)
Karim Yaghmour, Michel R. Dagenais, Measuring and characterizing system behavior using kernel-level event logging usenix annual technical conference. pp. 2- 2 ,(2000)
Steve Muir, The Seven Deadly Sins of Distributed Systems. First Workshop on Real, Large Distributed Systems ({WORLDS} 04). ,(2004)
Douglas Freimuth, Renu Tewari, Ashish Mehra, Thiemo Voigt, Kernel Mechanisms for Service Differentiation in Overloaded Web Servers usenix annual technical conference. pp. 189- 202 ,(2001)
Yi Cui, Yuan Xue, K. Nahrstedt, Optimal resource allocation in overlay multicast 11th IEEE International Conference on Network Protocols, 2003. Proceedings.. pp. 71- 81 ,(2003) , 10.1109/ICNP.2003.1249758
Richard Mortier, Rebecca Isaacs, Austin Donnelly, Paul Barham, Using magpie for request extraction and workload modelling operating systems design and implementation. pp. 18- 18 ,(2004)
Sanjoy Dasgupta, Experiments with random projection uncertainty in artificial intelligence. pp. 143- 151 ,(2000)
Klara Nahrstedt, Steven Y. Ko, Indranil Gupta, Jin Liang, MON: on-demand overlays for distributed system management WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2. pp. 13- 18 ,(2005)
Ruoming Pang, Kyoung Soo Park, Larry Peterson, Vivek Pai, Limin Wang, Reliability and security in the CoDeeN content distribution network usenix annual technical conference. pp. 14- 14 ,(2004)
Fábio Oliveira, Thu D. Nguyen, Kiran Nagaraja, Ricardo Bianchini, Richard P. Martin, Understanding and dealing with operator mistakes in internet services operating systems design and implementation. pp. 5- 5 ,(2004)