Adding a System
This guide is intended for those who would like to add a new system to benchpark, such
as vendors, system administrators, or application developers. Benchpark provides an API
for representing system specifications as objects and options to customize the
specification on the command line. System specifications are defined in system.py
files located in the systems directory: benchpark/systems/<system>/.
To determine if you need to create a new system:
Identify a system in Benchpark with the same hardware. See System Specifications to see hardware descriptions for all available benchpark systems.
If a system with the same hardware does not exist, add a new hardware description, as described in A. Adding System Hardware Specs.
Identify the same software stack description. Typically if the same hardware is already used by Benchpark, the same software stack may already be specified if the same vendor software stack is used on this hardware - or, if a software stack of your datacenter is already specified. If a system exists with the same software stack, add your system to that
system.pyas a value under theclustervariant (may be underinstance_type), and specify your systems specific resource configuration under theid_to_resourcesdictionary.If the same software stack description does not exist, determine if there is one that can be parameterized to match yours, otherwise proceed with adding a new system in B. Creating the System Definition (system.py).
A. Adding System Hardware Specs
We list hardware descriptions of Systems specified in Benchpark in the System Catalogue in System Specifications. If you are running on a system with an accelerator, find an existing system with the same accelerator vendor, and then secondarily, if you can, match the actual accelerator.
accelerator.vendor- Company nameaccelerator.name- Product nameaccelerator.ISA- Instruction set architectureaccelerator.uArch- Microarchitecture
Once you have found an existing system with a similar accelerator or if you do not have an accelerator, match the following processor specs as closely as you can.
processor.vendor- Company nameprocessor.name- Product nameprocessor.ISA- Instruction set architectureprocessor.uArch- Microarchitecture
And add the interconnect vendor and product name.
interconnect.vendor- Company nameinterconnect.name- Product name
Finally, match the integrator vendor and name.
integrator.vendor- Company nameintegrator.name- Product name
For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a similar config would share the A100 GPU, and CPU architecture may or may not match. Or, if I do not have GPUs and instead have SapphireRapids CPUs, the closest match would be another system with x86_64, Xeon Platinum, SapphireRapids.
If there is not an exact match, you may add a new directory in the
systems/all_hardware_descriptions/system_name where system_name follows the
naming convention:
[INTEGRATOR][-MICROARCHITECTURE][-ACCELERATOR][-NETWORK]
where:
INTEGRATOR = Integrator Company name
MICROARCHITECTURE = CPU Microarchitecture
ACCELERATOR = Accelerator Product Name
NETWORK = Network Product Name
In the systems/all_hardware_descriptions/system_name directory, add a
hardware_description.yaml which follows the yaml format of existing
hardware_description.yaml files.
B. Creating the System Definition (system.py)
Now that you have defined the hardware description for your system, you can now create
the system.py, which involves defining the software on your system. This includes
defining system resources, compilers, and pre-installed packages. The mandatory steps to
create a system.py are:
2. Specify the Class Initializer and Resources - At least one cluster must be defined.
3. Add Compiler Definitions - At least one compiler must be defined.
1. Creating the System Class
In this example, we will recreate a fully-functional example of the AWS system.py
that we use for benchpark tutorials (see aws-tutorial/system.py). To
start, we import the base benchpark System class, which our AwsTutorial system
will inherit from. We also import the maintainer and variant directives, which provide
the utilities to track a maintainer by their GitHub username and variants to specify
configurable properties of our system. There are many similar types of AWS nodes that
differ only in terms of the number of processors and/or memory (but otherwise have the
same system packages available). This can be encoded with a variant in a benchpark
system - the user can indicate what type of instance they are creating, and the system
description will reflect the instance type chosen. We can specify the different AWS
instances that share this same hardware and software specification using the
instance_type variant.
Note
Most system classes in Benchpark have a similar concept, but often they refer to physical (named) clusters with very-similar configs, and so they typically use the term “cluster” rather than “instance_type”.
from benchpark.directives import maintainers, variant
from benchpark.system import System
class AwsTutorial(System):
maintainers("michaelmckinsey1")
variant(
"instance_type",
values=("c7i.12xlarge", "c7i.24xlarge"),
default="c7i.12xlarge",
description="AWS instance type",
)
2. Specify the Class Initializer and Resources
When defining __init__() for our system, we invoke the parent class
System::__init__(), and set important system attributes using the
id_to_resources dictionary, which contains information for each cluster or
instance_type. We define common attributes a single time for all instance_type’s
inside the __init__() function:
system_site- The name of the site where thecluster/instance_typeis located.programming_models- List of applicable programming models.MPIis assumed for every system in benchpark, so you do not need to add it here. For this system, we addOpenMPCPUOnlySystem(different from GPU openmp). If we had NVIDIA accelerators, we would addCudaSystemto this list, andROCmSystemfor AMD.scheduler- The job scheduler.hardware_key- which defines a path to the yaml description you just created in the previous step A. Adding System Hardware Specs.sys_cores_per_node- The amount of hardware cores per node.sys_mem_per_node_GB- The amount of node memory (in gigabytes).
This information is used to determine the necessary resource allocation request for any experiment initialized with your chosen instance.
from benchpark.directives import maintainers, variant
from benchpark.openmpsystem import OpenMPCPUOnlySystem
from benchpark.paths import hardware_descriptions
from benchpark.system import System
class AwsTutorial(System):
maintainers("michaelmckinsey1")
id_to_resources = {
"c7i.24xlarge": {
"sys_cores_per_node": 96,
"sys_mem_per_node_GB": 192,
},
"c7i.12xlarge": {
"sys_cores_per_node": 48,
"sys_mem_per_node_GB": 96,
},
}
variant(
"instance_type",
values=("c7i.12xlarge", "c7i.24xlarge"),
default="c7i.12xlarge",
description="AWS instance type",
)
def __init__(self, spec):
super().__init__(spec)
# Common attributes across instances
self.programming_models = [OpenMPCPUOnlySystem()]
self.system_site = "aws"
self.scheduler = "flux"
self.hardware_key = (
str(hardware_descriptions)
+ "/AWS_Tutorial-sapphirerapids-EFA/hardware_description.yaml"
)
attrs = self.id_to_resources.get(self.spec.variants["instance_type"][0])
for k, v in attrs.items():
setattr(self, k, v)
3. Add Compiler Definitions
We define compilers that are available on our system by implementing
compute_compilers_section() function. Here are the general steps for how to write
this function, followed by our AWS example:
For each compiler, create the necessary config with
compiler_def().For each type of compiler (gcc, intel, etc.), combine them with
compiler_section_for().Merge the compiler definitions with merge_dicts (this part is unnecessary if you have only one type of compiler).
Generally, you will want to compose a minimal list of compilers: e.g., if you want to compile your benchmark with the oneAPI compiler, and have multiple versions to choose from, you would add a variant to the system, and the config would expose only one of them.
For our AWS system, the compiler we define is gcc@11.4.0. For the
compiler_def(), we must at minimum specify the spec, prefix, and exes:
spec- Similar to package specs,name@version. GCC in particular also needs thelanguagesvariant, where the list of languages depends on the availableexes(e.g., do not include “fortran” ifgfortranis not available). If you are not using GCC or Spack as your package manager,languagesis unnecessary.prefix- Prefix to the compiler binary directory, e.g.,/usr/for/usr/bin/gccexes- Dictionary to mapc,cxx, andfortranto the appropriate file found in the prefix.
from benchpark.directives import maintainers, variant
from benchpark.openmpsystem import OpenMPCPUOnlySystem
from benchpark.paths import hardware_descriptions
from benchpark.system import System, compiler_def, compiler_section_for
class AwsTutorial(System):
...
def compute_compilers_section(self):
return compiler_section_for(
"gcc",
[
compiler_def(
"gcc@11.4.0 languages=c,c++,fortran",
"/usr/",
{"c": "gcc", "cxx": "g++", "fortran": "gfortran-11"},
)
],
)
4. Add a Software Section
Here we define the compute_software_section(), where at minimum we must define the
default-compiler for Ramble. This is trivial for the single compiler that we have,
gcc@11.4.0.
from benchpark.directives import maintainers, variant
from benchpark.openmpsystem import OpenMPCPUOnlySystem
from benchpark.paths import hardware_descriptions
from benchpark.system import System, compiler_def, compiler_section_for
class AwsTutorial(System):
...
def compute_software_section(self):
return {
"software": {
"packages": {
"default-compiler": {"pkg_spec": "gcc@11.4.0"},
}
}
}
5. Add Software Definitions
Finally, we define the compute_packages_section() function, where you can include
any package that you would like the package manager, such as Spack, to find on the
system, meaning it will not build that package from source and use your system package
instead. For each package that you include, you need to define its spec name@version
and the system path prefix to the package. Additionally for Spack, you need to set
buildable: False to tell Spack not to build that package.
At minimum, we recommend to define externals for cmake and mpi (users also
typically define externals for other libraries, e.g., math libraries like blas and
lapack). This is because certain packages (e.g., cmake) can take a long time to
build, and packages such as mpi, blas, and lapack can influence runtime
performance significantly so it is prudent to use the versions optimized for our system.
Additionally, for systems with accelerators, define externals for CUDA and ROCm runtime
libraries (see externals examples for a CUDA system,
or a ROCm system).
See 1. Adding/Updating System Packages, for help on how to search for the packages available on
your system.
Note
For packages that declare virtual dependencies, e.g., depends_on("mpi"), you
need to define a virtual package "mpi": {"buildable": False},, followed by a
definition of at least one provider of this package (see the provider definition for
openmpi in our example). This is to ensure Spack uses the provider we specified,
and does not try to build another MPI package. See a similar example for blas,
lapack, and their provider atlas.
from benchpark.directives import maintainers, variant
from benchpark.openmpsystem import OpenMPCPUOnlySystem
from benchpark.paths import hardware_descriptions
from benchpark.system import System, compiler_def, compiler_section_for
class AwsTutorial(System):
...
def compute_packages_section(self):
return {
"packages": {
"blas": {"buildable": False},
"lapack": {"buildable": False},
"atlas": {
"externals": [{"spec": "atlas@3.10.3", "prefix": "/usr"}],
},
"mpi": {"buildable": False},
"openmpi": {
"externals": [
{
"spec": "openmpi@4.0%gcc@11.4.0",
"prefix": "/usr",
}
]
},
"cmake": {
"externals": [{"spec": "cmake@4.1.1", "prefix": "/usr"}],
"buildable": False,
},
...
}
}
6. Validating the System
To manually validate that your new system works, you should start by initializing your system:
benchpark system init --dest=aws-tutorial aws-tutorial
If this completes without errors, you can continue by creating a benchmark Adding a Benchmark.
System Appendix
1. Adding/Updating System Packages
External package definitions can be added/updated from the output of benchpark system
external. If you don’t have any packages yet, define compute_packages_section as
an empty dictionary:
def compute_packages_section(self):
return {
"packages": {}
}
And then whether or not you have packages, run benchpark system external <system>
cluster=<cluster>:
[ruby]$ benchpark system external llnl-cluster cluster=ruby
$ benchpark system external llnl-cluster
==> The following specs have been detected on this system and added to /g/g20/mckinsey/.benchmark/spack/etc/spack/packages.yaml
cmake@3.23.1 cmake@3.26.5 gmake@4.2.1 hwloc@2.11.2 python@2.7.18 python@2.7.18 python@3.6.8 python@3.9.12 python@3.10.8 python@3.12.8 tar@1.30
The Packages are different. Here are the differences:
{'dictionary_item_added': ["root['gmake']['buildable']"],
'dictionary_item_removed': ["root['elfutils']", "root['papi']", "root['unwind']", "root['blas']", "root['lapack']", "root['fftw']", "root['mpi']"],
'iterable_item_added': {"root['cmake']['externals'][1]": {'prefix': '/usr/tce',
'spec': 'cmake@3.23.1'},
"root['python']['externals'][1]": {'prefix': '/usr',
'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl~tkinter+uuid+zlib'},
"root['python']['externals'][2]": {'prefix': '/usr',
'spec': 'python@3.6.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'},
"root['python']['externals'][3]": {'prefix': '/usr/tce',
'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'},
"root['python']['externals'][4]": {'prefix': '/usr/tce',
'spec': 'python@3.9.12+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'},
"root['python']['externals'][5]": {'prefix': '/usr/workspace/wsa/mckinsey/venv/benchpark-3.12.8',
'spec': 'python@3.12.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}},
'values_changed': {"root['cmake']['externals'][0]['prefix']": {'new_value': '/usr',
'old_value': '/usr/tce/packages/cmake/cmake-3.26.3'},
"root['cmake']['externals'][0]['spec']": {'new_value': 'cmake@3.26.5',
'old_value': 'cmake@3.26.3'},
"root['hwloc']['externals'][0]['spec']": {'new_value': 'hwloc@2.11.2',
'old_value': 'hwloc@2.9.1'},
"root['python']['externals'][0]['prefix']": {'new_value': '/usr/WS1/mckinsey/venv/python-3.10.8',
'old_value': '/usr/tce/packages/python/python-3.9.12/'},
"root['python']['externals'][0]['spec']": {'new_value': 'python@3.10.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib',
'old_value': 'python@3.9.12'}}}
Here are all of the new packages:
{'cmake': {'buildable': False,
'externals': [{'prefix': '/usr', 'spec': 'cmake@3.26.5'},
{'prefix': '/usr/tce', 'spec': 'cmake@3.23.1'}]},
'gmake': {'buildable': False,
'externals': [{'prefix': '/usr', 'spec': 'gmake@4.2.1'}]},
'hwloc': {'buildable': False,
'externals': [{'prefix': '/usr', 'spec': 'hwloc@2.11.2'}]},
'python': {'buildable': False,
'externals': [{'prefix': '/usr/WS1/mckinsey/venv/python-3.10.8',
'spec': 'python@3.10.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'},
{'prefix': '/usr',
'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl~tkinter+uuid+zlib'},
{'prefix': '/usr',
'spec': 'python@3.6.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'},
{'prefix': '/usr/tce',
'spec': 'python@2.7.18+bz2+crypt+ctypes+dbm~lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'},
{'prefix': '/usr/tce',
'spec': 'python@3.9.12+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat~pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'},
{'prefix': '/usr/workspace/wsa/mckinsey/venv/benchpark-3.12.8',
'spec': 'python@3.12.8+bz2+crypt+ctypes+dbm+lzma+nis+pyexpat+pythoncmd+readline+sqlite3+ssl+tix+tkinter+uuid+zlib'}]},
'tar': {'buildable': False,
'externals': [{'prefix': '/usr', 'spec': 'tar@1.30'}]}}
where the command should be ran on a cluster that is defined for the given system, e.g.,
ruby for llnl-cluster. Use this output to update your package definitions in your
system.py’s compute_package_section().
For packages that are not found by benchpark system external, you can manually find
them using a command like the module command, if your system has environment
modules:
[dane6:~]$ module display gcc/12.1.1
...
prepend_path("PATH","/usr/tce/packages/gcc/gcc-12.1.1/bin")
Therefore, the prefix is /usr/tce/packages/gcc/gcc-12.1.1/ and the spec is
gcc@12.1.1.