作者: Jason Ansel , Gene Cooperman , Michael Rieker
DOI:
关键词:
摘要: We present a preliminary description of user-level checkpointing package, DMTCP, for Linux. The socket-based approach presents novel method distributed processes. This includes any dynamically created POSIX threads and forked child It also remotel y spawned processes via ssh other mechanisms. As with all checkpointing, no modification the kernel is needed, application code not modified. package checkpoints signal handlers, ordinary file descriptors, socket c ertain types descriptors. Each checkpointed process has an associated checkpoint . Hence, migration, even migration entire computation to new cluster, are achieved through simple expedient copying files host. However, adds additional restriction that source destination hos t must be homogeneous.