作者: Srinivasan Seshan , Suman Nath , Haifeng Yu , Phillip B. Gibbons
DOI:
关键词:
摘要: High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures a key issue in achieving high today's wide-area environments. This paper systematically revisits previously proposed techniques addressing failures. Using several real-world failure traces, we qualitatively answer four important questions regarding how to design systems tolerate such Based on our results, identify set of principles that system builders can use We show these lessons be effectively used by incorporating them into IrisStore, read-write layer provides availability. Our results using IrisStore the PlanetLab over 8-month period demonstrate its ability withstand large and meet preconfigured targets.