Capturing and identifying a complete and consistent set of checkpoint files

作者: Kalman Zvi Meth , Adnan M. Agbaria

DOI:

关键词:

摘要: A complete and consistent set of checkpoint files is captured identified for use in restarting a parallel program. When each process program takes checkpoint, it creates file. The file named, part that name includes version number the to be restarted, identifies its most current valid It provides this coordinating process. then decides which all processes participating restart. Once determined, forwarded restore themselves using corresponding having particular number.

参考文章(22)
Herbert S. Steelman, Paul H. Benson, Dwayne T. Crump, Steven T. Pancoast, Standby checkpoint to prevent data loss ,(1994)
James Steven Plank, Efficient checkpointing on MIMD architectures Princeton University. ,(1993)
John Stephen Liptay, Steven Tyler Comfort, Clifford Owen Hayden, Charles Franklin Webb, Susan Barbara Stillman, Checkpoint synchronization with instruction overlap enabled ,(1992)
Sandra J. Baylor, Mark E. Giampapa, Peter F. Corbett, Blake G. Fitch, Using virtual disks for disk system checkpointing ,(1994)