Subtleties in tolerating correlated failures in wide-area storage systems

作者: Srinivasan Seshan , Suman Nath , Haifeng Yu , Phillip B. Gibbons

DOI:

关键词:

摘要: High availability is widely accepted as an explicit requirement for distributed storage systems. Tolerating correlated failures a key issue in achieving high today's wide-area environments. This paper systematically revisits previously proposed techniques addressing failures. Using several real-world failure traces, we qualitatively answer four important questions regarding how to design systems tolerate such Based on our results, identify set of principles that system builders can use We show these lessons be effectively used by incorporating them into IrisStore, read-write layer provides availability. Our results using IrisStore the PlanetLab over 8-month period demonstrate its ability withstand large and meet preconfigured targets.

参考文章(39)
William J. Bolosky, John R. Douceur, David Ely, Marvin Theimer, Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs measurement and modeling of computer systems. ,vol. 28, pp. 34- 43 ,(2000) , 10.1145/339331.339345
Andreas Haeberlen, Peter Druschel, Alan Mislove, Glacier: highly durable, decentralized storage despite massive correlated failures networked systems design and implementation. pp. 143- 158 ,(2005) , 10.5555/1251203.1251214
Jianbo Shi, J. Malik, Normalized cuts and image segmentation IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 22, pp. 888- 905 ,(2000) , 10.1109/34.868688
Antony Rowstron, Peter Druschel, Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility symposium on operating systems principles. ,vol. 35, pp. 188- 201 ,(2001) , 10.1145/502034.502053
Jian Yin, Jean-Philippe Martin, Arun Venkataramani, Lorenzo Alvisi, Mike Dahlin, Separating agreement from execution for byzantine fault tolerant services symposium on operating systems principles. ,vol. 37, pp. 253- 267 ,(2003) , 10.1145/1165389.945470
Will E. Leland, Murad S. Taqqu, Walter Willinger, Daniel V. Wilson, On the self-similar nature of Ethernet traffic acm special interest group on data communication. ,vol. 25, pp. 183- 193 ,(1993) , 10.1145/166237.166255
Haifeng Yu, Amin Vahdat, Consistent and automatic replica regeneration ACM Transactions on Storage. ,vol. 1, pp. 3- 37 ,(2005) , 10.1145/1044956.1044958
D. Tang, R.K. Iyer, Analysis and modeling of correlated failures in multicomputer systems IEEE Transactions on Computers. ,vol. 41, pp. 567- 577 ,(1992) , 10.1109/12.142683
H. Weatherspoon, T. Moscovitz, J. Kubiatowicz, Introspective failure analysis: avoiding correlated failures in peer-to-peer systems symposium on reliable distributed systems. pp. 362- 367 ,(2002) , 10.1109/RELDIS.2002.1180211
Srinivasan Seshan, Suman Nath, Exploiting redundancy for robust sensing Carnegie Mellon University. ,(2005)