Comparing two Experiments Within Benchpark
This tutorial will guide you through the process of building and comparing distinct binaries of the same benchmark.
As an example, we will be using an experiment comparing two builds of the quicksilver benchmark, compiled with the gcc
and intel
compiler variants on LLNL’s Ruby cluster.
Building Multiple Binaries
Create separate system instances. Parameters could include: compiler, mpi, etc. In this case, we are changing the compiler variant
benchpark system init --dest=ruby-gcc llnl-cluster cluster=ruby compiler=gcc
benchpark system init --dest=ruby-intel llnl-cluster cluster=ruby compiler=intel
Creating experiment ramble.yaml
Create the experiment description Parameters could include: version, scaling, etc.
In this example, we are only changing the compiler. Because all experiment variables will be the same, we only need to generate a single experiment description.
This command initializes a quicksilver experiment configuration in the quicksilver
directory. We are doing weak scaling with openMP, and measuring MPI metrics with Caliper.
benchpark experiment init --dest=quicksilver quicksilver caliper=mpi +weak +openmp ~single_node
Note
Running a benchmark repeatedly will overwrite the existing output. A way to prevent this is to create multiple duplicate experiments, changing the experiment name (--dest=quicksilver
, --dest=quicksilver2
).
Running multiple experiments
Now that both the system and experiment parameters have been defined, we can setup each experiment directory. This step will install the binary, and create the execute_experiment shell script
$ benchpark setup quicksilver ruby-gcc workspace
$ benchpark setup quicksilver ruby-intel workspace
Now, we generate an execute_experiment shell script for each run, and install the benchmark along with all dependencies
$ ramble --workspace-dir workspace/quicksilver/ruby-gcc/workspace workspace setup
$ ramble --workspace-dir workspace/quicksilver/ruby-intel/workspace workspace setup
Completing these steps will result in the following structure
experiments_root/
ramble/
spack/
quicksilver/
ruby-gcc/
workspace/
experiments/
..../
execute_experiment
ruby-intel/
workspace/
experiments/
..../
execute_experiment
Verifying build details, differences between builds
Benchpark offers two ways to double check that each binary has built according to the specifications:
$ spack find -L quicksilver
-- linux-rhel8-sapphirerapids / gcc@12.1.1 ----------------------
fubnce7wzgjxhkim2cylijt4cbpfhxi6 quicksilver@master
-- linux-rhel8-sapphirerapids / intel@2021.6.0-classic ----------
qwev4yodp2joikf2oxvlo224ksjcqve3 quicksilver@master
==> 2 installed packages
This output shows each installed binary, along with their associated hashes. We can use these hashes to independently double-check the details of each build. In this case, we can check the quicksilver spec, along with its dependencies by running spack spec for each binary
spack spec quicksilver/{hash}
Each spec will generate a dependency tree, showing which variants and compilers were used for each compiler. The output from both commands is below
$ spack spec quicksilver/fubnce7wzgjxhkim2cylijt4cbpfhxi6
[+] quicksilver@master%gcc@12.1.1~cuda+mpi+openmp build_system=makefile arch=linux-rhel8-sapphirerapids
[+] ^gcc-runtime@12.1.1%gcc@12.1.1 build_system=generic arch=linux-rhel8-sapphirerapids
[e] ^glibc@2.28%gcc@12.1.1 build_system=autotools arch=linux-rhel8-sapphirerapids
[e] ^gmake@4.2.1%gcc@12.1.1~guile build_system=generic patches=ca60bd9,fe5b60d arch=linux-rhel8-sapphirerapids
[e] ^mvapich2@2.3.7-gcc1211%gcc@12.1.1~alloca~cuda~debug~hwloc_graphics~hwlocv2+regcache+wrapperrpath build_system=autotools ch3_rank_bits=32 fabrics=mrail file_systems=auto patches=d98d8e7 process_managers=auto threads=multiple arch=linux-rhel8-sapphirerapids
$ spack spec quicksilver/qwev4yodp2joikf2oxvlo224ksjcqve3
[+] quicksilver@master%intel@2021.6.0-classic~cuda+mpi+openmp build_system=makefile arch=linux-rhel8-sapphirerapids
[e] ^glibc@2.28%intel@2021.6.0-classic build_system=autotools arch=linux-rhel8-sapphirerapids
[e] ^gmake@4.2.1%intel@2021.6.0-classic~guile build_system=generic patches=ca60bd9,fe5b60d arch=linux-rhel8-sapphirerapids
[e] ^mvapich2@2.3.7-intel202160classic%intel@2021.6.0-classic~alloca~cuda~debug~hwloc_graphics~hwlocv2+regcache+wrapperrpath build_system=autotools ch3_rank_bits=32 fabrics=mrail file_systems=auto patches=d98d8e7 process_managers=auto threads=multiple arch=linux-rhel8-sapphirerapids
Notice that each dependency tree differs in the compilers used (gcc@12.1.1 vs. intel@2021.6.0)
This can also be done in a single command by the diffBuildSpecs.py
script (see scripts).
spack-python lib/scripts/diffBuildSpecs.py quicksilver/{hash1} quicksilver/{hash2}
Note
spack-python
is required to import the spack libraries needed for this script. It will automatically be added to your $PATH
when you run benchpark setup ...
.
The output shows the quicksilver build tree twice: the first hash compared to the
second hash, followed by the second hash compared to the first hash.
The first quicksilver spec tree highlights
additional specs present in quicksilver/fubnce7
and not the other hash, e.g. gcc-runtime
.
Specs that are in both trees are white with the version differences between the specs highlighted in red, e.g. glibc
.
$ spack-python lib/scripts/diffBuildSpecs.py --truncate quicksilver/fubnce7wzgjxhkim2cylijt4cbpfhxi6 quicksilver/qwev4yodp2joikf2oxvlo224ksjcqve3

Running Experiments
To launch the experiments in separate job allocations, run the following commands:
ramble --workspace-dir workspace/quicksilver/ruby-gcc/workspace on
ramble --workspace-dir workspace/quicksilver/ruby-intel/workspace on
Collecting FOMs
Most benchmarks within benchpark generate a figure of merit, which is a measure of performance. We can analyze the figure of merit by running the following:
ramble --workspace-dir workspace/quicksilver/ruby-gcc/workspace workspace analyze
ramble --workspace-dir workspace/quicksilver/ruby-intel/workspace workspace analyze
Note
An example bash script that automates the building and running of this analysis on the LLNL Dane
cluster is located at benchpark/docs/examples/compare_experiment_builds/compareExperimentBuilds.sh
.
Analyzing Caliper Data with Thicket
Enabling the Caliper modifier (see Benchpark Modifiers) gives us a much more detailed picture about any performance differences, beyond looking at runtimes we can generate a calltree profile to see which functions are contributing to a performance difference.
The Caliper .cali
files are automatically generated in the experiment directory. To further analyze the caliper data, Thicket can be used to view the calltree and generate plots:
import thicket as th
tk = th.Thicket.from_caliperreader([
"experiment_name1.cali"
])
print(tk.tree(metric_column="time"))
tk.metadata.plot(
x="mpi.world.size",
y="FOM",
kind="scatter"
)
For more information on Caliper and Thicket, refer to https://software.llnl.gov/Caliper/ and https://thicket.readthedocs.io/en/latest/,