8. LAMMPS ACE
Note
The documentation herein needs to be updated for current performance.
This is the documentation for the benchmark [LAMMPS], specifically KOKKOS-LAMMPS (see [KOKKOS-LAMMPS]). The content herein was created by the following authors (in alphabetical order).
This material is based upon work supported by the Sandia National Laboratories (SNL), a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia under the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Content herein considered unclassified with unlimited distribution under SAND2023-01070O.
8.1. Purpose
Heavily pulled from their [lammps-site]:
LAMMPS is a classical molecular dynamics code with a focus on materials modeling. It’s an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. Many of its models have versions that provide accelerated performance on CPUs, GPUs, and Intel Xeon Phis. The code is designed to be easy to modify or extend with new functionality.
8.2. Characteristics
The goal is to utilize the specified version of LAMMPS (see Application Version) that runs the benchmark problem (see Problem) correctly (see Correctness if changes are made to LAMMPS).
8.2.1. Application Version
The command to clone is provided below.
git clone git@github.com:lammps/lammps.git
cd lammps
git checkout a51f9ba0e719be544293987bb3cbd9939f1b01ee
Note
The Git SHA will be updated with a tag soon.
The script to clone can be downloaded from lammps_clone.sh. It can also be executed in place to clone into
docs/32_lammpsACE/lammps.
cd docs/32_lammpsACE
./lammps_clone.sh
8.2.2. Problem
This problem runs an ACE (atomic cluster expansion) machine-learned potential for a copper crystal using a face-entered cubic (fcc) lattice at 300 K. Please refer to [pace-site] and [pace-article] for more information.
This problem is mostly present within the upstream LAMMPS repository. The components of this problem are listed below (paths given are within LAMMPS repository). Each of these files will need to be copied into a run directory for the simulation.
examples/PACKAGES/pace/Cu-PBE-core-rep.aceThis is an input needed for the simulation.
examples/PACKAGES/pace/in.pace.productThis is the default inputfile that controls the simulation. Some parameters within this file may need to be changed depending upon what is being run (i.e., these parameters control how much memory it uses). The modified version of this within the template directory should be preferred; more on this below.
A template run directory was created to help ease performing a
simulation; this directory is templatedir. There are some key
files within it.
templatedir/in.pace.productThis is a modified version of the input file with some key parameters changed to be more appropriate as a benchmark. It is designed to run for approximately 11 minutes in 2 phases of 5.5 minutes each. SPARTA already directly computes the FOM and outputs it for each of the phases. This second phase of 5.5 minutes is the FOM that is to be tracked.
templatedir/lammps_ln.shThis file creates symbolic links to files and folders needed for the simulation.
templatedir/lammps_batch_elcapitan.shThis is a batch script compatible with El Capitan. It has capabilities for setting key job parameters from the command line; more on that below.
An excerpt from this input file that has its key parameters is provided below.
<snip>
variable L index 64.0
region box block 0 ${L} 0 ${L} 0 ${L}
<snip>
pair_style pace product chunksize 49152
<snip>
thermo 10
thermo_style custom step cpu temp epair etotal press v_delenergy v_delpress
<snip>
##################################
### Benchmarking modifications ###
##################################
# Add a thermostat to keep temperature from falling
variable tdamp equal $(dt)
fix mynvt all nvt temp 300.0 300.0 ${tdamp}
# Some systems buffer extensively
thermo_modify flush yes
# Print out the value of L for parsing ease
print "The value of L is $L"
### Throw out first 5 minutes for hardware equilibrium
# Stop after 5.5 minutes
fix 2 all halt 10 tlimit > 330.0 message no error continue
run 10000000
### Run another 5 minutes for final FOM
unfix 2
# Stop after 5.5 minutes
fix 3 all halt 10 tlimit > 330.0 message no
run 10000000
These parameters are described below.
LThis corresponds to the length scale factor. This will scale the dimensions of the problem.
thermoCompute and print thermodynamic info (e.g., temperature, energy, pressure) on timesteps that are a multiple of this parameter and at the beginning and end of a simulation.
This problem exhibits different runtime characteristics whether or not Kokkos is enabled. Specifically, there is some work that is performed within Kokkos that helps to keep this problem as well behaved from a throughput perspective as possible. Ergo, Kokkos must be enabled for the simulations regardless of the hardware being used (the cases herein have configurations that enable it for reference).
8.2.3. Figure of Merit
Each LAMMPS simulation writes out a file named “log.lammps”. At the end of this simulation is a block that resembles the following example.
Step CPU Temp E_pair TotEng Press v_delenergy v_delpress
640 0 299.7264 -3834241 -3793616.4 62562.774 -3.7252903e-08 4.8748916e-10
650 5.1882405 300.1416 -3834085.9 -3793405 62656.487 3.7252903e-08 2.2555469e-10
660 10.389581 300.04536 -3834003.9 -3793336 62705.836 -1.4901161e-08 2.910383e-11
<snip>
1260 323.38353 300.55705 -3834187.5 -3793450.4 62842.117 9.778887e-09 1.5279511e-10
1270 328.58739 300.25528 -3834141.7 -3793445.4 62861.607 1.0244548e-08 -5.0931703e-10
1280 333.79045 300.1357 -3834154.7 -3793474.6 62856.262 -1.1641532e-08 1.6734703e-10
Loop time of 333.812 on 4 procs for 640 steps with 1048576 atoms
Performance: 0.083 ns/day, 289.767 hours/ns, 1.917 timesteps/s, 2.010 Matom-step/s
45.1% CPU use with 4 MPI tasks x 1 OpenMP threads
The quantity of interest (QOI) is “Mega atom steps per second,” which
is directly computed as Matom-step/s in the example above.
It is desired to capture the FOM for varying problem sizes that encompass utilizing 50% to 80% of available memory (when all PEs are utilized). The ultimate goal is to maximize this throughput FOM while utilizing at least 50% of available memory.
8.2.4. Correctness
The aforementioned relevant block of output within “log.lammps” is replicated below.
Step CPU Temp E_pair TotEng Press v_delenergy v_delpress
640 0 299.7264 -3834241 -3793616.4 62562.774 -3.7252903e-08 4.8748916e-10
650 5.1882405 300.1416 -3834085.9 -3793405 62656.487 3.7252903e-08 2.2555469e-10
660 10.389581 300.04536 -3834003.9 -3793336 62705.836 -1.4901161e-08 2.910383e-11
<snip>
1260 323.38353 300.55705 -3834187.5 -3793450.4 62842.117 9.778887e-09 1.5279511e-10
1270 328.58739 300.25528 -3834141.7 -3793445.4 62861.607 1.0244548e-08 -5.0931703e-10
1280 333.79045 300.1357 -3834154.7 -3793474.6 62856.262 -1.1641532e-08 1.6734703e-10
Loop time of 333.812 on 4 procs for 640 steps with 1048576 atoms
Performance: 0.083 ns/day, 289.767 hours/ns, 1.917 timesteps/s, 2.010 Matom-step/s
45.1% CPU use with 4 MPI tasks x 1 OpenMP threads
There are several columns of interest regarding correctness; these are listed below.
StepThis is the step number and is the first column.
TempThis tracks the temperature aspect of the simulation.
PressThis tracks the pressure aspect of the simulation.
Assessing the correctness will involve comparing these quantities across modified (henceforth denoted with “mod” subscript) and unmodified (“unmod” subscript) LAMMPS subject to the methodology below.
The first step is to adjust the thermo parameter
to a value of 1 so fine-grained output is generated; if this is
significantly slowing down computation, then it can be increased to a
value of 10. Then, produce output from LAMMPSunmod with the
same settings.
The second step is to compute the absolute differences between
modified and unmodified LAMMPS for Temp and Press for each
row, i, whose Step is relevant for the FOM for LAMMPSmod,
where
i is each line whose
CPUtime is part of the second phase for LAMMPSmod
The third step is to compute the arithmetic mean of each of the aforementioned quantities over the n rows,
where
The fourth step is to compute the arithmetic mean of the n matching rows of the unmodified LAMMPS,
The fifth step is to normalize the differences with the baseline values to create the error ratios,
The sixth and final step is to check over all of the error ratios and if any of them exceed 5%, then the modifications are not approved without discussing them with this benchmark’s authors. The success criteria are:
8.3. Source Code Modifications
Please see Run Rules Synopsis for general guidance on allowed modifications.
8.4. System Information
The platforms utilized for benchmarking activities are listed and described below.
Advanced Technology System 4 (ATS-4), also known as El Capitan (see El Capitan)
8.5. Building
A script (lammps_clone.sh) is provided to clone the LAMMPS
repository within the “lammps” folder. Instructions are provided on
how to build LAMMPS for the following systems:
Generic (see Generic)
Advanced Technology System 4 (ATS-4), also known as El Capitan (see El Capitan)
8.5.1. Generic
Refer to LAMMP’s [lammps-build] documentation for generic instructions.
8.5.2. El Capitan
Instructions for building on El Capitan are provided below. These instructions assume this repository has been cloned and that the current working directory is at the top level of this repository.
cd docs/32_lammpsACE
./lammps_build_elcapitan.sh
The script discussed above is lammps_build_elcapitan.sh and is produced below for convenience and
reference.
#!/usr/bin/env bash
# set top-level script parameters
umask 022
set -e
set -x
# create vars for common directories and files
dir_root="`git rev-parse --show-toplevel`"
dir_pwd="` pwd -P `"
dir_src="${dir_pwd}/lammps"
dir_build="${dir_pwd}/lammps/_build"
file_log="${dir_pwd}/lammps_build.log"
# redirect STDOUT and STDERR through tee
exec &> >(tee >(ts '[%Y-%m-%d %H:%M:%S]' > "${file_log}"))
# let's turn on verbosity now
set -v
# output for posterity
hostname
uptime
lscpu
# clean and reset source
pushd "${dir_src}"
git clean -fdx
git reset --hard
popd
# create build directory
test -d "${dir_build}" && rm -rf "${dir_build}"
mkdir -p "${dir_build}"
# build
# list current environment
module list
# alter environment
. lammps_env_elcapitan.sh
# list current environment
module list
pushd "${dir_build}"
cmake \
-C ../cmake/presets/elcapitan_kokkos.cmake \
-DPKG_ML-PACE=on \
-DBUILD_MPI=on \
-D CMAKE_BUILD_TYPE=Release \
../cmake
/usr/bin/time --verbose -- \
nice -n 1 \
gmake -j 64
popd
# gracefully exit
exit 0
8.6. Running
Instructions are provided on how to run LAMMPS for the following systems:
Advanced Technology System 4 (ATS-4), also known as El Capitan (see El Capitan) * Profiling with Kokkos Tools on El Capitan (see
LAMMPSProfileKokkosToolsElCapitan)
8.6.1. El Capitan
Note
This section will be updated with some more content soon.
An example for performing simulations on El Capitan is provided below.
# first, copy templatedir into something useful
cp -a templatedir useful
# next, go into the run folder
cd useful
# submit job and set parameters on command line if desired
# this example sets L (aka lammps_len) to 64
# this example turns on Kokkos Tools profiling (aka kokkos_tools)
# this example runs on 1 node (aka --nodes=1)
lammps_len=64 is_kokkos_tools=1 flux batch --nodes=1 lammps_batch_elcapitan.sh
8.6.1.1. Profiling with Kokkos Tools
Scripts are provided to clone and build Kokkos Tools. The steps to do both are provided below.
# go into the LAMMPS documentation folder
cd docs/32_lammpsACE
# clone Kokkos Tools
./kokkos_tools_clone.sh
# build Kokkos Tools' Space Time
./kokkos_tools_build_elcapitan.sh
Once built, the command line variable is_kokkos_tools can be set
to 1 for the batch script to turn it on. After a successful run,
it will output additional memory information. An example of this (for
L equal to 64) on El Capitan is provided below that shows
approximately 99.6 GB of memory allocated on each GPU.
KOKKOS HIP SPACE:
===================
MAX MEMORY ALLOCATED: 99615719.6 kB
8.7. Verification of Results
Additional information:
The sub-section Compute Figure of Merit describes how to compute the FOM
Single-node results from LAMMPS are provided on the following systems:
Advanced Technology System 4 (ATS-4), also known as El Capitan (see El Capitan - Single Node)
Multi-node results from SPARTA are provided on the following system(s):
Advanced Technology System 4 (ATS-4), also known as El Capitan (see El Capitan - Many Nodes)
8.7.1. Compute Figure of Merit
The figure of merit (FOM) is automatically computed by LAMMPS. The benchmark run is broken into two phases; extract the FOM from the last phase. The relevant excerpt from the “log.lammps” output is below.
Step CPU Temp E_pair TotEng Press v_delenergy v_delpress
640 0 299.7264 -3834241 -3793616.4 62562.774 -3.7252903e-08 4.8748916e-10
650 5.1882405 300.1416 -3834085.9 -3793405 62656.487 3.7252903e-08 2.2555469e-10
660 10.389581 300.04536 -3834003.9 -3793336 62705.836 -1.4901161e-08 2.910383e-11
<snip>
1260 323.38353 300.55705 -3834187.5 -3793450.4 62842.117 9.778887e-09 1.5279511e-10
1270 328.58739 300.25528 -3834141.7 -3793445.4 62861.607 1.0244548e-08 -5.0931703e-10
1280 333.79045 300.1357 -3834154.7 -3793474.6 62856.262 -1.1641532e-08 1.6734703e-10
Loop time of 333.812 on 4 procs for 640 steps with 1048576 atoms
Performance: 0.083 ns/day, 289.767 hours/ns, 1.917 timesteps/s, 2.010 Matom-step/s
45.1% CPU use with 4 MPI tasks x 1 OpenMP threads
The FOM is the quantity Matom-step/s, which in this example is 2.010.
8.7.2. El Capitan - Single Node
Note
This section will be updated with some more content soon.
A single-node example is below that showcases 2.010 Mega atom steps per second per node. The other relevant parameters are displayed as part of the output.
Step CPU Temp E_pair TotEng Press v_delenergy v_delpress
640 0 299.7264 -3834241 -3793616.4 62562.774 -3.7252903e-08 4.8748916e-10
650 5.1882405 300.1416 -3834085.9 -3793405 62656.487 3.7252903e-08 2.2555469e-10
660 10.389581 300.04536 -3834003.9 -3793336 62705.836 -1.4901161e-08 2.910383e-11
<snip>
1260 323.38353 300.55705 -3834187.5 -3793450.4 62842.117 9.778887e-09 1.5279511e-10
1270 328.58739 300.25528 -3834141.7 -3793445.4 62861.607 1.0244548e-08 -5.0931703e-10
1280 333.79045 300.1357 -3834154.7 -3793474.6 62856.262 -1.1641532e-08 1.6734703e-10
Loop time of 333.812 on 4 procs for 640 steps with 1048576 atoms
Performance: 0.083 ns/day, 289.767 hours/ns, 1.917 timesteps/s, 2.010 Matom-step/s
45.1% CPU use with 4 MPI tasks x 1 OpenMP threads
8.7.3. El Capitan - Many Nodes
Note
This section will be updated with some more content soon.
8.8. References
LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, S. J. Plimpton, Comp Phys Comm, 271 (2022) 10817.
LAMMPS Developers, ‘LAMMPS Molecular Dynamics Simulator’, 2026. [Online]. Available: https://lammps.org. [Accessed: 15- Feb- 2026]
LAMMPS Developers, ‘LAMMPS Documentation’, 2026. [Online]. Available: https://dics.lammps.org/Manual.html. [Accessed: 15- Feb- 2026]
LAMMPS Developers, ‘pair_style pace command - LAMMPS Documentation’, 2026. [Online]. Available: https://docs.lammps.org/pair_pace.html#description
Lysogorskiy, Y., Oord, C.v.d., Bochkarev, A. et al., Performant implementation of the atomic cluster expansion (PACE) and application to copper and silicon. NPJ Comput. Mater. 7, 97 (2021). # codespell:ignore https://doi.org/10.1038/s41524-021-00559-9
Anders Johansson, Evan Weinberg, Christian Trott, Megan McCarthy, and Stan Moore. 2025. LAMMPS-KOKKOS: Performance Portable Molecular Dynamics Across Exascale Architectures. In Proceedings of the SC ‘25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC Workshops ‘25). Association for Computing Machinery, New York, NY, USA, 1217–1232. https://doi.org/10.1145/3731599.3767498