Fork me on GitHub

SCR: Scalable Checkpoint/Restart for MPI November 27, 2017

Multilevel checkpointing allows HPC applications to take both frequent inexpensive checkpoints and less frequent, more resilient checkpoints, resulting in better efficiency and reduced load on the parallel file system. Accordingly, LLNL researchers developed the Scalable Checkpoint/Restart (SCR) library for the large-scale, production system context.

Learn more on our Computation website. Read the SCR user guide and fork the code on GitHub.