Hierarchical storage systems are the wave of the future for HPC centers like LLNL’s Livermore Computing Complex. The Unify project aims to improve I/O performance by utilizing distributed, node-local storage systems. This design scales bandwidth and capacity according to the computer resources used by a given job. Furthermore, Unify avoids inter-job interference from parallel file systems or shared burst buffers.
Unify is a suite of specialized, flexible file systems – the first is available on GitHub with more on the way – that can be included in a user’s job allocations. A user can request which Unify file system(s) to be loaded and the respective mount points. Tests on LLNL’s Catalyst cluster show more than 2x improvement in write performance.
Like much of LLNL’s HPC performance improvement software, Unify is open source. The first Unify file system, UnifyCR (for checkpoint/restart workloads), is already available on GitHub. The team is working on another file system in the Unify “family” designed for machine learning workloads, in which large data sets need to be distributed quickly. Additional Unify file systems are in development.