WiDS checker: combating bugs in distributed systems

作者: Xuezheng Liu , Zheng Zhang , Wei Lin , Aimin Pan

DOI:

关键词: SuiteProgramming languageState (computer science)Log miningScripting languageDistributed computingDebuggingComputer scienceProcess (engineering)PaxosSoftware deployment

摘要: Despite many efforts, the predominant practice of debugging a distributed system is still printf-based log mining, which both tedious and error-prone. In this paper, we present WiDS Checker, unified framework that can check systems through simulation reproduced runs from real deployment. All instances be executed within one process, multiplexed properly to observe "happensbefore" relationship, thus accurately reveal full state. A versatile script language allows developer refine properties into straightforward assertions, checker inspects for violations. Combining these two components, are able otherwise impossible check. We applied Checker over suite complex found non-trivial bugs, including in previously proven Paxos specification. Our experience demonstrates usefulness us gain insights beneficial future research area.

参考文章(30)
Srikanth Kandula, Yuanyuan Zhou, Sudarshan M. Srinivasan, Christopher R. Andrews, Flashback: a lightweight extension for rollback and deterministic replay for software debugging usenix annual technical conference. pp. 3- 3 ,(2004)
Dawson R. Engler, Madanlal Musuvathi, Model checking large network protocol implementations networked systems design and implementation. pp. 12- 12 ,(2004)
Chandramohan A. Thekkath, Marc Najork, Nick Murphy, Lidong Zhou, John MacCormick, Boxwood: abstractions as the foundation for storage infrastructure operating systems design and implementation. pp. 8- 8 ,(2004)
N.A. Lynch, M.R. Tuttle, An introduction to input/output automata CWI quarterly. ,vol. 2, pp. 219- 246 ,(1989)
Haifeng Yang, M. Piumatti, S.K. Singhal, Internet Scale Testing of PNRP Using WiDS Network Simulator international conference on peer-to-peer computing. pp. 227- 228 ,(2006) , 10.1109/P2P.2006.21
Richard Mortier, Rebecca Isaacs, Austin Donnelly, Paul Barham, Using magpie for request extraction and workload modelling operating systems design and implementation. pp. 18- 18 ,(2004)
Amin Vahdat, Ranjit Jhala, Charles Killian, James W. Anderson, Life, death, and the critical transition: finding liveness bugs in systems code networked systems design and implementation. pp. 18- 18 ,(2007)
Scott Shenker, Dennis Geels, Gautam Altekar, Ion Stoica, Replay debugging for distributed applications usenix annual technical conference. pp. 27- 27 ,(2006)
Amin Vahdat, Dejan Kostić, Charles Killian, Sooraj Bhat, Adolfo Rodriguez, MACEDON: methodology for automatically creating, evaluating, and designing overlay networks networked systems design and implementation. pp. 20- 20 ,(2004)
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson, Eraser: a dynamic data race detector for multithreaded programs ACM Transactions on Computer Systems. ,vol. 15, pp. 391- 411 ,(1997) , 10.1145/265924.265927