Adding a System
This guide is intended for those wanting to run a benchmark on a new system, such as vendors, system administrators, or application developers. It assumes a system specification does not already exist.
System specifications include two types of information:
Hardware specs in hardware_description.yaml (e.g., how many CPU cores the node has)
Software stack specs in system.py (e.g., installed compilers and libraries, along with their locations and versions)
To specify a new system:
Identify a system in Benchpark with the same hardware.
If a system with the same hardware does not exist, add a new hardware description, as described in Adding System Hardware Specs section.
Identify the same software stack description. Typically if the same hardware
is already used by Benchpark, the same software stack may already be specified if the same vendor software stack is used on this hardware - or, if a software stack of your datacenter is already specified.
If the same software stack description does not exists,
determine if there is one that can be parameterized to match yours.
If can’t parameterize existing software description, add a new one.
1. Adding System Hardware Specs
We list hardware descriptions of Systems specified in Benchpark in the System Catalogue in System Specifications.
If you are running on a system with an accelerator, find an existing system with the same accelerator vendor, and then secondarily, if you can, match the actual accelerator.
accelerator.vendor
accelerator.name
Once you have found an existing system with a similar accelerator or if you do not have an accelerator, match the following processor specs as closely as you can.
processor.name
processor.ISA
processor.uArch
For example, if your system has an NVIDIA A100 GPU and an Intel x86 Icelake CPUs, a similar config would share the A100 GPU, and CPU architecture may or may not match. Or, if I do not have GPUs and instead have SapphireRapids CPUs, the closest match would be another system with x86_64, Xeon Platinum, SapphireRapids.
If there is not an exact match, you may add a new directory in the systems/all_hardware_descriptions/system_name where system_name follows the naming convention:
[INTEGRATOR]-MICROARCHITECTURE[-GPU][-NETWORK]
where:
INTEGRATOR = COMPANY[_PRODUCTNAME][...]
MICROARCHITECTURE = CPU Microarchitecture
GPU = GPU Product Name
NETWORK = Network Product Name
In the systems/all_hardware_descriptions/system_name directory, add a hardware_description.yaml which follows the yaml format of existing hardware_description.yaml files.
Adding or Parameterizing System Software Stack
system.py
in Benchpark provides an API to represent a
system software stack as a command line parameterizable object.
If none of the available software stack specifications match your system,
you may add a new-system directory in the systems directory
where the new-system directory name follows the naming convention:
SITE-SYSTEMNAME
where:
SITE = nosite | abbreviated datacenter name
SYSTEMNAME = the name of the specific system
Next, copy the system.py from the system with the most similar software stack into new-system directory, and update it to match your system. For example, the generic-x86 system software stack is defined in:
$benchpark
├── systems
├── generic-x86
├── system.py
The System base class defined in /lib/benchpark/system.py
is shown below,
some or all of the functions can be overridden to define custom system behavior.
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0
import hashlib
import importlib.util
import os
import pathlib
import sys
import tempfile
import yaml
import benchpark.paths
from benchpark.directives import ExperimentSystemBase
import benchpark.repo
from benchpark.runtime import RuntimeResources
from typing import Dict, Tuple
import benchpark.spec
import benchpark.variant
bootstrapper = RuntimeResources(benchpark.paths.benchpark_home) # noqa
bootstrapper.bootstrap() # noqa
import ramble.config as cfg # noqa
import ramble.language.language_helpers # noqa
import ramble.language.shared_language # noqa
import spack.util.spack_yaml as syaml # noqa
# We cannot import this the normal way because it from modern Spack
# and mixing modern Spack modules with ramble modules that depend on
# ancient Spack will cause errors. This module is safe to load as an
# individual because it is not used by Ramble
# The following code block implements the line
# import spack.schema.packages as packages_schema
schemas = {
"spack.schema.packages": f"{bootstrapper.spack_location}/lib/spack/spack/schema/packages.py",
"spack.schema.compilers": f"{bootstrapper.spack_location}/lib/spack/spack/schema/compilers.py",
}
def load_schema(schema_id, schema_path):
schema_spec = importlib.util.spec_from_file_location(schema_id, schema_path)
schema = importlib.util.module_from_spec(schema_spec)
sys.modules[schema_id] = schema
schema_spec.loader.exec_module(schema)
return schema
packages_schema = load_schema(
"spack.schema.packages",
f"{bootstrapper.spack_location}/lib/spack/spack/schema/packages.py",
)
compilers_schema = load_schema(
"spack.schema.compilers",
f"{bootstrapper.spack_location}/lib/spack/spack/schema/compilers.py",
)
_repo_path = benchpark.repo.paths[benchpark.repo.ObjectTypes.systems]
def _hash_id(content_list):
sha256_hash = hashlib.sha256()
for x in content_list:
sha256_hash.update(x.encode("utf-8"))
return sha256_hash.hexdigest()
class System(ExperimentSystemBase):
variants: Dict[
str,
Tuple["benchpark.variant.Variant", "benchpark.spec.ConcreteSystemSpec"],
]
def __init__(self, spec):
self.spec: "benchpark.spec.ConcreteSystemSpec" = spec
super().__init__()
def initialize(self):
self.external_resources = None
self.sys_cores_per_node = None
self.sys_gpus_per_node = None
self.sys_mem_per_node = None
self.scheduler = None
self.timeout = "120"
self.queue = None
self.required = ["sys_cores_per_node", "scheduler", "timeout"]
def generate_description(self, output_dir):
self.initialize()
output_dir = pathlib.Path(output_dir)
variables_yaml = output_dir / "variables.yaml"
with open(variables_yaml, "w") as f:
f.write(self.variables_yaml())
self.external_packages(output_dir)
self.compiler_description(output_dir)
spec_hash = self.system_uid()
system_id_path = output_dir / "system_id.yaml"
with open(system_id_path, "w") as f:
f.write(
f"""\
system:
name: {self.__class__.__name__}
spec: {str(self.spec)}
config-hash: {spec_hash}
"""
)
def system_uid(self):
return _hash_id([str(self.spec)])
def _merge_config_files(self, schema, selections, dst_path, override=False):
data = cfg.read_config_file(selections[0], schema)
for selection in selections[1:]:
cfg.merge_yaml(data, cfg.read_config_file(selection, schema))
if override:
for top_level_key, _ in data.items():
break
top_level_key.override = True
with open(dst_path, "w") as outstream:
syaml.dump_config(data, outstream)
def external_pkg_configs(self):
return None
def compiler_configs(self):
return None
def external_packages(self, output_dir):
selections = self.external_pkg_configs()
if not selections:
return
aux = output_dir / "auxiliary_software_files"
os.makedirs(aux, exist_ok=True)
aux_packages = aux / "packages.yaml"
self._merge_config_files(packages_schema.schema, selections, aux_packages)
def compiler_description(self, output_dir):
selections = self.compiler_configs()
if not selections:
return
aux = output_dir / "auxiliary_software_files"
os.makedirs(aux, exist_ok=True)
aux_compilers = aux / "compilers.yaml"
self._merge_config_files(
compilers_schema.schema, selections, aux_compilers, override=True
)
def system_specific_variables(self):
return {}
def variables_yaml(self):
for attr in self.required:
if not getattr(self, attr, None):
raise ValueError(f"Missing required info: {attr}")
optionals = list()
for opt in ["sys_gpus_per_node", "sys_mem_per_node", "queue"]:
if getattr(self, opt, None):
optionals.append(f"{opt}: {getattr(self, opt)}")
system_specific = list()
for k, v in self.system_specific_variables().items():
system_specific.append(f"{k}: {v}")
extra_variables = optionals + system_specific
indent = " " * 2
extras_as_cfg = ""
if extra_variables:
extras_as_cfg = f"\n{indent}".join(extra_variables)
return f"""\
# SPDX-License-Identifier: Apache-2.0
variables:
timeout: "{self.timeout}"
scheduler: "{self.scheduler}"
sys_cores_per_node: "{self.sys_cores_per_node}"
{extras_as_cfg}
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
"""
def _adhoc_cfgs(self):
if not getattr(self, "_tmp_cfgs", None):
self._tmp_cfgs = tempfile.mkdtemp()
self._adhoc_cfg_idx = 0
return self._tmp_cfgs
def next_adhoc_cfg(self):
basedir = self._adhoc_cfgs()
self._adhoc_cfg_idx += 1
return os.path.join(basedir, str(self._adhoc_cfg_idx))
def unique_dir_for_description(system_dir):
system_id_path = os.path.join(system_dir, "system_id.yaml")
with open(system_id_path, "r") as f:
data = yaml.safe_load(f)
name = data["system"]["name"]
spec_hash = data["system"]["config-hash"]
return f"{name}-{spec_hash[:7]}"
systems/{SYSTEM}/system.py
should inherit from the System base class.
The generic-x86 system subclass should run on most x86_64 systems, but we mostly provide it as a starting point for modifying or testing. Potential common changes might be to edit the scheduler or number of cores per node, adding a GPU configuration, or adding other external compilers or packages.
To make these changes, we provided an example below, where we start with the generic-x86 system.py, and make a system called Modifiedx86.
First, make a copy of the system.py file in generic_x86 folder and move it into a new folder, e.g.,
systems/modified_x86/system.py
.
Then, update the class name to Modifiedx86
.:
class Modifiedx86(System):
Next, to match our new system, we change the scheduler to slurm and the number of cores per node to 48, and number of GPUs per node to 2.:
# this sets basic attributes of our system def initialize(self): super().initialize() self.scheduler = "slurm" self.sys_cores_per_node = "48" self.sys_gpus_per_node = "2"
Let’s say the new system’s GPUs are NVIDIA, we can add a variant that allows us to specify the version of CUDA we want to use, and the location of those CUDA installations on our system. We then add the spack package configuration for our CUDA installations into the
systems/modified_x86/externals/cuda
directory (examples in Siera and Tioga systems).
- ::
# import the variant feature at the top of your system.py from benchpark.directives import variant
# this allows us to specify which cuda version we want as a command line parameter variant(
“cuda”, default=”11-8-0”, values=(“11-8-0”, “10-1-243”), description=”CUDA version”,
)
# set this to pass to spack def system_specific_variables(self):
return {“cuda_arch”: “70”}
# define the external package locations def external_pkg_configs(self):
externals = Modifiedx86.resource_location / “externals”
cuda_ver = self.spec.variants[“cuda”][0]
selections = [] if cuda_ver == “10-1-243”:
selections.append(externals / “cuda” / “00-version-10-1-243-packages.yaml”)
- elif cuda_ver == “11-8-0”:
selections.append(externals / “cuda” / “01-version-11-8-0-packages.yaml”)
return selections
Note, if your externals are not installed via Spack, read Spack documentation on modules.
4. Next, add any of the packages that can be managed by spack, such as blas/cublas pointing to the correct version,
this will generate the software configurations for spack (software.yaml
). The actual version will be rendered by Ramble when it is built.
def sw_description(self):
return """\
software:
packages:
default-compiler:
pkg_spec: gcc
compiler-gcc:
pkg_spec: gcc
default-mpi:
pkg_spec: openmpi
blas:
pkg_spec: cublas@{default_cuda_version}
cublas-cuda:
pkg_spec: cublas@{default_cuda_version}
"""
5. The full system.py class for the modified_x86 system should now look like:
import pathlib
from benchpark.directives import variant
from benchpark.system import System
class Modifiedx86(System):
variant(
"cuda",
default="11-8-0",
values=("11-8-0", "10-1-243"),
description="CUDA version",
)
def initialize(self):
super().initialize()
self.scheduler = "slurm"
setattr(self, "sys_cores_per_node", 48)
self.sys_gpus_per_node = "2"
def generate_description(self, output_dir):
super().generate_description(output_dir)
sw_description = pathlib.Path(output_dir) / "software.yaml"
with open(sw_description, "w") as f:
f.write(self.sw_description())
def system_specific_variables(self):
return {"cuda_arch": "70"}
def external_pkg_configs(self):
externals = Modifiedx86.resource_location / "externals"
cuda_ver = self.spec.variants["cuda"][0]
selections = []
if cuda_ver == "10-1-243":
selections.append(externals / "cuda" / "00-version-10-1-243-packages.yaml")
elif cuda_ver == "11-8-0":
selections.append(externals / "cuda" / "01-version-11-8-0-packages.yaml")
return selections
def sw_description(self):
"""This is somewhat vestigial, and maybe deleted later. The experiments
will fail if these variables are not defined though, so for now
they are still generated (but with more-generic values).
"""
return """\
software:
packages:
default-compiler:
pkg_spec: gcc
compiler-gcc:
pkg_spec: gcc
default-mpi:
pkg_spec: openmpi
blas:
pkg_spec: cublas@{default_cuda_version}
cublas-cuda:
pkg_spec: cublas@{default_cuda_version}
"""
Once the modified system subclass is written, run:
benchpark system init --dest=modifiedx86-system modifiedx86
This will generate the required yaml configurations for your system and you can validate it works with a static experiment test.
Validating the System
To manually validate your new system, you should initialize it and run an existing experiment such as saxpy. For example:
benchpark system init --dest=modifiedx86-system modifiedx86
benchpark experiment init --dest=saxpy saxpy +openmp
benchpark setup ./saxpy ./modifiedx86-system workspace/
Then you can run the commands provided by the output, the experiments should be built and run successfully without any errors.
The following yaml files are examples of what is generated for the modified_x86 system from the example after it is initialized:
system_id.yaml
describes the system hardware, including the integrator (and the name of the product node or cluster type), the processor, (optionally) the accelerator, and the network; the information included here is what you will typically see recorded about the system on Top500.org. We intend to make the system definitions in Benchpark searchable, and will add a schema to enforce consistency; until then, please copy the file and fill out all of the fields without changing the keys. Also listed is the specific system the config was developed and tested on, as well as the known systems with the same hardware so that the users of those systems can find this system specification.
system:
name: Modifiedx86
spec: sysbuiltin.modifiedx86 cuda=11-8-0
config-hash: 5310ebe8b2c841108e5da854c75dab931f5397a7fb41726902bb8a51ffb84a36
2. software.yaml
defines default compiler and package names your package
manager (Spack) should use to build the benchmarks on this system.
software.yaml
becomes the spack section in the Ramble configuration
file.
software:
packages:
default-compiler:
pkg_spec: 'gcc'
compiler-gcc:
pkg_spec: 'gcc'
default-mpi:
pkg_spec: 'openmpi'
blas:
pkg_spec: cublas@{default_cuda_version}
cublas-cuda:
pkg_spec: cublas@{default_cuda_version}
variables.yaml
defines system-specific launcher and job scheduler.
variables:
timeout: "120"
scheduler: "slurm"
sys_cores_per_node: "48"
sys_gpus_per_node: 2
cuda_arch: 70
max_request: "1000" # n_ranks/n_nodes cannot exceed this
n_ranks: '1000001' # placeholder value
n_nodes: '1000001' # placeholder value
batch_submit: "placeholder"
mpi_command: "placeholder"
Once you can run an experiment successfully, and the yaml looks correct the new system has been validated and you can continue your Benchpark Workflow.