1. Introduction

This is benchmark documentation for a Department of Energy (DOE) National Nuclear Security Administration (NNSA) Advanced Simulation and Computing (ASC) Future Computing Resource (FCR).

1.1. Benchmark Overview

Benchmark	Description	Language	Parallelism	Libraries
AMG2023	AMG solver of sparse matrices	C	MPI+CUDA/HIP/SYCL OpenMP on CPU	Hypre
Kripke	Scalable 3D Sn deterministic particle transport code	C++	MPI+RAJA	RAJA, CHAI, Camp
Laghos	LAGrangian High-Order Solver, unstructured high-order finite element compressible gas dynamics	C++	MPI+RAJA/CUDA/HIP	MFEM, Hypre
RAJA Performance Suite	Collection of loop-based computational kernels found in HPC applications	C++	MPI+RAJA /CUDA/HIP/OpenMP	RAJA
ScaFFold	Scale-Free Fractal Benchmark, Proxy for emerging models such as programmatic inverse-design projects	Python	MPI/NCCL/RCCL CUDA/HIP	PyTorch
Branson	Implicit Monte Carlo transport	C++	MPI+CUDA/HIP	N/A
Sparta	Direct Simulation Monte Carlo	C++	MPI+Kokkos	Kokkos
LAMMPS ACE	Molecular dynamics using Atomic Cluster Expansion (ACE)	C++	MPI+Kokkos	Kokkos
Remhos	REMap High-Order Solver, unstructured high-order finite element advection	C++	MPI+RAJA/CUDA/HIP	MFEM, Hypre
MiniEM	Electro-Magnetics solver	C++	MPI+Kokkos	Kokkos
MLPerf	Llama 3.1 405B training	Python	NCCL+CUDA	NVIDIA NeMo

1.2. Run Rules Synopsis

Source code modification categories:

Baseline: “out-of-the-box” performance

Code modifications not permitted

Compiler options can be modified, library substitutions permitted unless prohibited for a specific benchmark (see details on benchmark pages), problem decomposition may be changed

If provided code cannot run on the proposed architecture as-is, limited source code modifications are permitted to port and tune for the target architecture using directives or commonly used interfaces.

Optimized: “speed of light”

Aggressive code changes that enhance performance are permitted. Optimizations that will be applicable to mission applications are of more value.

Algorithms fundamental to the program may not be replaced. Wholesale algorithm changes or manual rewriting of loops that become strongly architecture specific are of less value.

The modified code must still pass validation tests.

Optimizations will be reviewed by subject matter experts for applicability to the larger application portfolio and other goals such as performance portability and programmer productivity.

1.3. Approvals

Benchmarks is released under the Creative Commons Attribution 4.0 International Public License. For more details, see the https://github.com/LLNL/benchmarks/blob/develop/LICENSE and https://github.com/LLNL/benchmarks/blob/develop/NOTICE files. SPDX-License-Identifier: CC-BY-4.0. LLNL-DATA-2007856.
Content from Sandia National Laboratories considered unclassified with unlimited distribution under SAND2023-12176O, SAND2023-01069O, and SAND2023-01070O.