作者: Tarek F. Abdelzaher , Chengdu Huang
DOI:
关键词:
摘要: Providing contractual performance assurances in distributed systems is an important and challenging problem. From the users' perspective, stringent timing requirements are becoming more critical. Meanwhile, from system engineers' driven towards increasingly larger scale, integration higher complexity, making predictable difficult. In this dissertation, we present design, implementation, evaluation of a scalable self-diagnosing content distribution service that provides global bounded latencies on access. Our solution firstly involves decentralized replication scheme dynamically selects subsets servers wide-area networks for different classes so per-class network latency bounds achieved. The decisions made autonomously by based measured workload conditions. proceeds way balances among servers, hence fully utilizing capacity avoiding bound violations. efficiency nature enables our to scale up very large networks. The capability comes learning-based problem diagnosis techniques propose. increasing complexity has motivated design machine learning approaches automate some management tasks. However, with increase current suffer serious scalability issues. We two automatically identify probable causes problems server multiple tiers replicated sites. By incorporating number diagnostic information sources using temporal segmentation mechanism applying transfer techniques, achieve both improved accuracy.