User-Level Socket-Based Checkpointing for Distributed and Parallel Computation

作者: Jason Ansel , Gene Cooperman , Michael Rieker

DOI:

关键词:

摘要: We present a preliminary description of user-level checkpointing package, DMTCP, for Linux. The socket-based approach presents novel method distributed processes. This includes any dynamically created POSIX threads and forked child It also remotel y spawned processes via ssh other mechanisms. As with all checkpointing, no modification the kernel is needed, application code not modified. package checkpoints signal handlers, ordinary file descriptors, socket c ertain types descriptors. Each checkpointed process has an associated checkpoint . Hence, migration, even migration entire computation to new cluster, are achieved through simple expedient copying files host. However, adds additional restriction that source destination hos t must be homogeneous.

参考文章(23)
Richard L. Graham, Sung-Eun Choi, David J. Daniel, Nehal N. Desai, Ronald G. Minnich, Craig E. Rasmussen, L. Dean Risinger, Mitchel W. Sukalski, A network-failure-tolerant message-passing system for terascale clusters International Journal of Parallel Programming. ,vol. 31, pp. 285- 303 ,(2003) , 10.1023/A:1024504726988
Victor C. Zandy, Barton P. Miller, Checkpoints of GUI-based Applications. usenix annual technical conference. pp. 155- 165 ,(2003)
Weimin Zheng, Youhui Zhang, Ruini Xue, Wenguang Chen, Thckpt: Transparent Checkpointing of Linux Processes Under IA-64. parallel and distributed processing techniques and applications. pp. 325- 331 ,(2005)
Jason Ansel, Gene Cooperman, Michael Rieker, Transparent User-Level Checkpointing for the Native Posix Thread Library for Linux. parallel and distributed processing techniques and applications. pp. 492- 498 ,(2006)
Jim Basney, Miron Livny, Todd Tannenbaum, Michael Litzkow, Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System University of Wisconsin-Madison Department of Computer Sciences. ,(1997)
William R. Dieter, James E. Lumpp, User-Level Checkpointing for LinuxThreads Programs usenix annual technical conference. pp. 81- 92 ,(2001)
James S. Plank, Kai Li, Micah Beck, Gerry Kingsley, Libckpt: transparent checkpointing under Unix usenix annual technical conference. pp. 18- 18 ,(1995)
Hazim Abdel-Shafi, Evan Speight, John K. Bennett, Efficient user-level thread migration and checkpointing on windows NT clusters conference on usenix windows nt symposium. pp. 1- 1 ,(1999)
A. Clematis, V. Gianuzzi, CPVM-extending PVM for consistent checkpointing euromicro workshop on parallel and distributed processing. pp. 67- 74 ,(1996) , 10.1109/EMPDP.1996.500570
P.E. Chung, Woei-Jyh Lee, Yennun Huang, D. Liang, Chung-Yih Wang, Winckp: a transparent checkpointing and rollback recovery tool for Windows NT applications ieee international symposium on fault tolerant computing. pp. 220- 223 ,(1999) , 10.1109/FTCS.1999.781053