1. Introduction

This is benchmark documentation for a Department of Energy (DOE) National Nuclear Security Administration (NNSA) Advanced Simulation and Computing (ASC) Future Computing Resource (FCR).

1.1. Benchmark Overview

Benchmark

Description

Language

Parallelism

Libraries

AMG2023

AMG solver of sparse matrices

C

MPI+CUDA/HIP/SYCL
OpenMP on CPU

Hypre

Kripke

Scalable 3D Sn deterministic
particle transport code

C++

MPI+RAJA

RAJA, CHAI, Camp

Laghos

LAGrangian High-Order Solver,
unstructured high-order finite
element compressible gas dynamics

C++

MPI+RAJA/CUDA/HIP

MFEM, Hypre

RAJA Performance Suite

Collection of loop-based computational
kernels found in HPC applications

C++

MPI+RAJA
/CUDA/HIP/OpenMP

RAJA

ScaFFold

Scale-Free Fractal Benchmark,
Proxy for emerging models such as
programmatic inverse-design projects

Python

MPI/NCCL/RCCL
CUDA/HIP

PyTorch

Branson

Implicit Monte Carlo transport

C++

MPI+CUDA/HIP

N/A

Sparta

Direct Simulation Monte Carlo

C++

MPI+Kokkos

Kokkos

LAMMPS ACE

Molecular dynamics using
Atomic Cluster Expansion (ACE)

C++

MPI+Kokkos

Kokkos

Remhos

REMap High-Order Solver, unstructured
high-order finite element advection

C++

MPI+RAJA/CUDA/HIP

MFEM, Hypre

MiniEM

Electro-Magnetics solver

C++

MPI+Kokkos

Kokkos

MLPerf

Llama 3.1 405B training

Python

NCCL+CUDA

NVIDIA NeMo

1.2. Run Rules Synopsis

Source code modification categories:

  1. Baseline: “out-of-the-box” performance

  • Code modifications not permitted

  • Compiler options can be modified, library substitutions permitted unless prohibited for a specific benchmark (see details on benchmark pages), problem decomposition may be changed

  • If provided code cannot run on the proposed architecture as-is, limited source code modifications are permitted to port and tune for the target architecture using directives or commonly used interfaces.

  1. Optimized: “speed of light”

  • Aggressive code changes that enhance performance are permitted. Optimizations that will be applicable to mission applications are of more value.

  • Algorithms fundamental to the program may not be replaced. Wholesale algorithm changes or manual rewriting of loops that become strongly architecture specific are of less value.

  • The modified code must still pass validation tests.

  • Optimizations will be reviewed by subject matter experts for applicability to the larger application portfolio and other goals such as performance portability and programmer productivity.

1.3. Approvals