Deterministic Replay Using Global Clock

作者: Yunji Chen , Tianshi Chen , Ling Li , Ruiyang Wu , Daofu Liu

DOI: 10.1145/2445572.2445573

关键词:

摘要: Debugging parallel programs is a well-known difficult problem. A promising method to facilitate debugging using hardware support achieve deterministic replay on Chip Multi-Processor (CMP). As Design-For-Debug (DFD) feature, practical hardware-assisted scheme should have low design and verification costs, as well small log size.To these goals, we propose novel succinct named LReplay. The key innovation of LReplay that instead recording the logical time orders between instructions or instruction blocks previous investigations, built upon pending period information infused by global clock. By recorded information, about 99p execution are inferrable, implying only needs record directly residual 1p noninferrable in production run. can be addressed simple yet cost-effective direction prediction technique, which further reduces size LReplay.Benefiting from preceding innovations, overall over SPLASH-2 benchmarks 0.17B/K-Inst (byte per k-instruction) for sequential consistency, 0.57B/K-Inst Godson-3 consistency. Such sizes smaller an order magnitude than schemes incurring no performance loss. Furthermore, consumes 0.5p area CMP, since it requires trivial modifications existing components Godson-3. features demonstrate potential integrating into future industrial processors.

参考文章(56)
Barton P. Miller, Robert H. B. Netzer, On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions international conference on parallel processing. pp. 93- 97 ,(1990)
W. Huott, M. McManus, D. Knebel, S. Steen, D. Manzer, P. Sanda, S. Wilson, Y. Chan, A. Pelella, S. Polonsky, The attack of the "Holey Shmoos": a case study of advanced DFD and picosecond imaging circuit analysis (PICA) international test conference. pp. 883- 891 ,(1999) , 10.1109/TEST.1999.805820
C. Pyron, R. Bangalore, D. Belete, J. Goertz, A. Razdan, D. Younger, Silicon symptoms to solutions: applying design for debug techniques international test conference. pp. 664- 672 ,(2002) , 10.1109/TEST.2002.1041818
Yunji Chen, Weiwu Hu, Tianshi Chen, Global Clock, Physical Time Order and Pending Period Analysis in Multiprocessor Systems arXiv: Distributed, Parallel, and Cluster Computing. ,(2009)
Leblanc, Mellor-Crummey, Debugging Parallel Programs with Instant Replay IEEE Transactions on Computers. ,vol. 36, pp. 471- 482 ,(1987) , 10.1109/TC.1987.1676929
Robert H. B. Netzer, Optimal tracing and replay for debugging shared-memory parallel programs Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging - PADD '93. ,vol. 28, pp. 1- 11 ,(1993) , 10.1145/174266.174268
Phillip B. Gibbons, Ephraim Korach, On testing cache-coherent shared memories Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures - SPAA '94. pp. 177- 188 ,(1994) , 10.1145/181014.181328
Gaurav Mittal, Jose Paredes, Juergen Pille, Phillip J. Restle, Balaram Sinharoy, George Smith, William J. Starke, Scott Taylor, A. James Van Norstrand, Stephen Weitzel, Phillip G. Williams, Victor Zyuban, Dieter F. Wendel, Ron Kalla, James Warnock, Robert Cargnoni, Sam G. Chu, Joachim G. Clabes, Daniel Dreps, David Hrusecky, Josh Friedrich, Saiful Islam, Jim Kahle, Jens Leenstra, POWER7™, a Highly Parallel, Scalable Multi-Core High End Server Processor IEEE Journal of Solid-state Circuits. ,vol. 46, pp. 145- 161 ,(2011) , 10.1109/JSSC.2010.2080611
Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, James Cownie, PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs symposium on code generation and optimization. pp. 2- 11 ,(2010) , 10.1145/1772954.1772958