NetCP: Consistent, Non-Interruptive and Efficient Checkpointing and Rollback of SDN

作者: Ye Yu , Chen Qian , Wenfei Wu , Ying Zhang

DOI: 10.1109/IWQOS.2018.8624142

关键词: Control theoryConsistency (database systems)Overhead (computing)Rollback recoveryTraverseRollbackComputer scienceInterruptComputer network

摘要: Network failures are inevitable due to its increasing complexity, which significantly hampers system availability and performance. While adopting checkpointing rollback recovery protocols (C/R for abbreviation) from distributed systems into computer networks is promising, several specific challenges appear as we design a C/R Software-Defined Networks (SDN). The should be coordinated with other applications in the SDN controller, each individual switch not interrupt traffic traversing it, controller faces challenge of time space overhead. We propose framework SDN, named NetCP. NetCP coordinates get consistent global checkpoints, it leverages redundant forwarding tables switches so avoid interrupting traffic, analyzes dependencies between make minimal decision. have implemented prototype using current standard tools demonstrate that achieves consistency, non-interruption, efficiency negligible

参考文章(23)
Nick McKeown, Amin Vahdat, Vimalkumar Jeyakumar, Fei Ye, Junda Liu, Shidong Zhang, Mickey Ju, Hongyi Zeng, Libra: divide and conquer to verify forwarding tables in huge networks networked systems design and implementation. pp. 87- 99 ,(2014) , 10.5555/2616448.2616457
Masoud Moshref, Ramesh Govindan, Minlan Yu, Abhishek Sharma, Scalable rule management for data centers networked systems design and implementation. pp. 157- 170 ,(2013)
Naga Katta, Haoyu Zhang, Michael Freedman, Jennifer Rexford, Ravana: controller fault-tolerance in software-defined networking acm special interest group on data communication. pp. 4- ,(2015) , 10.1145/2774993.2774996
Balakrishnan Chandrasekaran, Theophilus Benson, Tolerating SDN application failures with LegoSDN Proceedings of the third workshop on Hot topics in software defined networking. pp. 235- 236 ,(2014) , 10.1145/2620728.2620781
R. Koo, S. Toueg, Checkpointing and Rollback-Recovery for Distributed Systems IEEE Transactions on Software Engineering. ,vol. 13, pp. 23- 31 ,(1987) , 10.1109/TSE.1987.232562
Jason Duell, The design and implementation of Berkeley Lab's linuxcheckpoint/restart Lawrence Berkeley National Laboratory. ,(2005) , 10.2172/891617
Justine Sherry, Peter Xiang Gao, Soumya Basu, Aurojit Panda, Arvind Krishnamurthy, Christian Maciocco, Maziar Manesh, João Martins, Sylvia Ratnasamy, Luigi Rizzo, Scott Shenker, Rollback-Recovery for Middleboxes acm special interest group on data communication. ,vol. 45, pp. 227- 240 ,(2015) , 10.1145/2785956.2787501
David Erickson, The beacon openflow controller acm special interest group on data communication. pp. 13- 18 ,(2013) , 10.1145/2491185.2491189
Nate Foster, Rob Harrison, Michael J. Freedman, Christopher Monsanto, Jennifer Rexford, Alec Story, David Walker, Frenetic ACM SIGPLAN Notices. ,vol. 46, pp. 279- 291 ,(2011) , 10.1145/2034574.2034812
Elmootazbellah Nabil Elnozahy, Lorenzo Alvisi, Yi-Min Wang, David B Johnson, A survey of rollback-recovery protocols in message-passing systems ACM Computing Surveys. ,vol. 34, pp. 375- 408 ,(2002) , 10.1145/568522.568525