Automatic on-line failure diagnosis at the end-user site

作者: Spiros Xanthos , Shan Lu , Yuanyuan Zhou , Joseph Tucek , Chengdu Huang

DOI:

关键词: Protocol (object-oriented programming)End userComplete informationSoftware analysis patternRollbackComputer securityComputer scienceOverhead (computing)ProgrammerSoftware

摘要: Production run software failures cause endless grief to end-users, and challenges programmers as they commonly have incomplete information about the bug, facing great hurdles reproduce it. Users are often unable or unwilling provide diagnostic due technical privacy concerns; even if is available, failure analysis time-consuming. We propose performing initial diagnosis automatically at end user's site. The moment of a valuable commodity strive reproduce-- leveraging it directly reduces effort while simultaneously addressing concerns. Additionally, we protocol. So far know, this first such automatic protocol proposed for on-line diagnosis. By mimicking steps human programmer follows dissecting failure, deduce important information. Beyond use, can also reduce in-house testing. We implement some these ideas. Using lightweight checkpoint rollback techniques dynamic, run-time tools, initiate several bugs. Our preliminary results show that efficiently accurately find likely root causes fault propagation chains. Further, normal execution overhead only 2%.

参考文章(17)
Srikanth Kandula, Yuanyuan Zhou, Sudarshan M. Srinivasan, Christopher R. Andrews, Flashback: a lightweight extension for rollback and deterministic replay for software debugging usenix annual technical conference. pp. 3- 3 ,(2004)
Brad Karp, Hyang-Ah Kim, Autograph: toward automated, distributed worm signature detection usenix security symposium. pp. 19- 19 ,(2004)
George W. Dunlap, Peter M. Chen, Samuel T. King, Debugging operating systems with time-traveling virtual machines usenix annual technical conference. pp. 1- 1 ,(2005)
Martin Rinard, Cristian Cadar, William S. Beebee, Daniel M. Roy, Tudor Leu, Daniel Dumitran, Enhancing server availability and security through failure-oblivious computing operating systems design and implementation. pp. 21- 21 ,(2004)
Neelam Gupta, Haifeng He, Xiangyu Zhang, Rajiv Gupta, Locating faulty code using failure-inducing chops Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering - ASE '05. pp. 263- 272 ,(2005) , 10.1145/1101908.1101948
Andrew Ayers, Richard Schooler, Chris Metcalf, Anant Agarwal, Junghwan Rhee, Emmett Witchel, TraceBack: first fault diagnosis by reconstruction of distributed control flow programming language design and implementation. ,vol. 40, pp. 201- 212 ,(2005) , 10.1145/1064978.1065035
Feng Qin, Shan Lu, Yuanyuan Zhou, SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs high-performance computer architecture. pp. 291- 302 ,(2005) , 10.1109/HPCA.2005.29
D. Brumley, J. Newsome, D. Song, Hao Wang, Somesh Jha, Towards automatic generation of vulnerability-based signatures ieee symposium on security and privacy. pp. 2- 16 ,(2006) , 10.1109/SP.2006.41
Min Xu, Rastislav Bodik, Mark D. Hill, A "flight data recorder" for enabling full-system multiprocessor deterministic replay Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03. ,vol. 31, pp. 122- 135 ,(2003) , 10.1145/859618.859633
Feng Qin, Joseph Tucek, Jagadeesan Sundaresan, Yuanyuan Zhou, Rx: treating bugs as allergies---a safe method to survive software failures symposium on operating systems principles. ,vol. 39, pp. 235- 248 ,(2005) , 10.1145/1095809.1095833