Troubleshooting interactive complexity bugs

作者: Tarek F. Abdelzaher , Mohammad Maifi Hasan Khan

DOI:

关键词:

摘要: The term “interactive complexity” was introduced by Charles Perrow in his famous book Normal Accidents: Living with High-Risk Technologies [1]. He used the to describe interacting tendency of systems large number components. argued that, components, multiple failures often interact some unexpected way, leading catastrophic such as planes or nuclear power plants. also suggested that increasing interactive complexity and tight coupling, interactions are bound happen. Indeed, proliferation Internet enabled cheap embedded devices built sensors actuators (e.g., smart phones, appliances), physical world is increasingly becoming an integral part logical computation. As computing much more responsive surrounding environments, it difficult test full extent before deployment real world. Hence, due increased coupling between world, fail preform poorly once deployed life. Unintended among various system across environments blame for problem. With this growing trend, bugs arise interaction different distributed components nodes likely get worse, going affect reliability significantly. This calls new tools techniques troubleshoot future software systems. In dissertation, we address significant challenge troubleshooting emerging cyber-physical using data mining techniques. More specifically, applied discriminative sequence algorithm isolate chains events (not necessarily contiguous) causally correlated failure analyzing logs. In first our thesis, tool, successfully identified multi-channel MAC (medium access control) layer protocol wireless sensor network [2], kernel level race condition bug LiteOS operating system, corner case design flaw directed diffusion [3]. Next, extended approach identify “symbolic” patterns, where absolute values replaced abstract symbols whenever appropriate subtle patterns have examined applicability harmful may poor integration adaptive server clusters. “cyclic” center applications, which potentially highlights self-reinforcing loops. Finally, complement work on complexity, diagnosing occasional “lack interaction” system. Such caused unresponsive nodes. We develop tele-diagnostic powertracer, in-situ tool uses external measurements determine internal health host most cause its failure. Using distinguish several categories behavior including energy depletion, antenna damage, radio disconnection, crashes, anomalous reboots. To best knowledge, present a diagnostic diagnose remotely.

参考文章(95)
Takeaki Uno, Yuzo Uchida, Tatsuya Asai, Hiroki Arimura, LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets. FIMI. ,(2003)
Tarek Abdelzaher, Liqian Luo, Mohammad Maifi Hasan Khan, Chengdu Huang, SNTS: sensor network troubleshooting suite distributed computing in sensor systems. pp. 142- 157 ,(2007) , 10.5555/1769087.1769097
Richard Maclin, Lars Asker, Ensembles as a sequence of classifiers international joint conference on artificial intelligence. ,vol. 2, pp. 860- 865 ,(1997)
KG Langendoen, GP Halkes, M. Lodder, A Global-State Perspective on Sensor Network Debugging HotEmNets 2008. pp. 37- 41 ,(2008)
Xifeng Yan, Jiawei Han, Chao Liu, Mining control flow abnormality for logic error isolation siam international conference on data mining. pp. 106- 117 ,(2006)
Tarek Abdelzaher, Praveen Jayachandran, Insik Shin, Jin Heo, Dong Wang, OptiTuner: An Automatic Distributed Performance Optimization Service and a Server Farm Application ,(2009)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Mohammad Maifi Hasan Khan, Tarek Abdelzaher, Kamal Kant Gupta, Towards Diagnostic Simulation in Sensor Networks Distributed Computing in Sensor Systems. pp. 252- 265 ,(2008) , 10.1007/978-3-540-69170-9_17
Zhenyu Guo, Feibo Chen, Xuezheng Liu, Xi Wang, Ming Wu, Zheng Zhang, Xiaochen Lian, Jian Tang, M. Frans Kaashoek, D 3 S: debugging deployed distributed systems networked systems design and implementation. pp. 423- 437 ,(2008)