Adding a System Specification

System specifications include details like

  • How many CPUs are there per node on the system

  • What pre-installed MPI/GPU libraries are available

A system description is a set of YAML files collected into a directory. You can generate these files directly, but Benchpark also provides an API where you can represent systems as objects and customize their description with command line arguments.

Using System API to Generate a System Description

System classes are defined in var/sys_repo; once the class has been defined, you can invoke benchpark system init to generate a system configuration directory that can then be passed to benchpark setup:

benchpark system init --dest=tioga-system tioga rocm=551 compiler=cce ~gtl

where “tioga rocm=551 compiler=cce ~gtl” describes a config for Tioga that uses ROCm 5.5.1 components, a CCE compiler, and MPI without GTL support.

If you want to add support for a new system you can add a class definition for that system in a separate directory in var/sys_repo/systems/. For example the Tioga system is defined in:

$benchpark
├── var
   ├── sys_repo
      ├── systems
         ├── tioga
            ├── system.py

Static System Configurations

benchpark/configs contains a number of static, manually-generated system definitions. As an alternative to implementing a new System class, you can add a new directory with a name which identifies the system.

The naming convention for the systems is as following:

SITE-[SYSTEMNAME-][INTEGRATOR]-MICROARCHITECTURE[-GPU][-NETWORK]

where:

SITE = nosite | DATACENTERNAME

SYSTEMNAME = the name of the specific system

INTEGRATOR = COMPANY[_PRODUCTNAME][...]

MICROARCHITECTURE = CPU Microarchitecture

GPU = GPU Product Name

NETWORK = Network Product Name

Benchpark has definitions for the following (nosite) systems:

  • nosite-AWS_PCluster_Hpc7a-zen4-EFA

  • nosite-HPECray-zen3-MI250X-Slingshot (same hardware as Frontier, Lumi, Tioga)

  • nosite-x86_64 (x86 CPU only platform)

Benchpark has definitions for the following site-specific systems:

  • LLNL-Magma-Penguin-icelake-OmniPath

  • LLNL-Sierra-IBM-power9-V100-Infiniband (Sierra, Lassen)

  • LLNL-Tioga-HPECray-zen3-MI250X-Slingshot

The following files are required for each nosite system benchpark/configs/${SYSTEM}:

  1. system_definition.yaml describes the system hardware, including the integrator (and the name of the product node or cluster type), the processor, (optionally) the accelerator, and the network; the information included here is what you will typically see recorded about the system on Top500.org. We intend to make the system definitions in Benchpark searchable, and will add a schema to enforce consistency; until then, please copy the file and fill out all of the fields without changing the keys. Also listed is the specific system the config was developed and tested on, as well as the known systems with the same hardware so that the users of those systems can find this system specification.

system_definition:
  name: HPECray-zen3-MI250X-Slingshot # or site-specific name, e.g., Frontier at ORNL
  site:
  system: HPECray-zen3-MI250X-Slingshot
  integrator:
    vendor: HPECray
    name: EX235a
  processor:
    vendor: AMD
    name: EPYC-Zen3
    ISA: x86_64
    uArch: zen3
  accelerator:
    vendor: AMD
    name: MI250X
    ISA: GCN
    uArch: gfx90a
  interconnect:
    vendor: HPECray
    name: Slingshot11
  system-tested:
    site: LLNL
    name: tioga
    installation-year: 2022
    description: [top500](https://www.top500.org/system/180052)
  top500-system-instances:
    - Frontier (ORNL)
    - Lumi     (CSC)
    - Tioga    (LLNL)

2. software.yaml defines default compiler and package names your package manager (Spack) should use to build the benchmarks on this system. software.yaml becomes the spack section in the Ramble configuration file.

software:
  packages:
    default-compiler:
      pkg_spec: 'spack_spec_for_package'
    default-mpi:
      pkg_spec: 'spack_spec_for_package'
  1. variables.yaml defines system-specific launcher and job scheduler.

variables:
  timeout: '30'
  scheduler: "slurm"
  sys_cores_per_node: "128"
  sys_gpus_per_node: "4"
  sys_mem_per_node unset
  max_request: "1000"  # n_ranks/n_nodes cannot exceed this
  n_ranks: '1000001'  # placeholder value
  n_nodes: '1000001'  # placeholder value
  batch_submit: "placeholder"
  mpi_command: "placeholder"
  # batch_queue: "pbatch"
  # batch_bank: "guest"

If defining a specific system, one can be more specific with available software versions and packages, as demonstrated in Adding a Site-specific System Specification.