Adding an Experiment
This guide is intended for those wanting to define a new set of experiment parameters for a given benchmark.
Similar to systems, Benchpark also provides an API where you can represent experiments as objects and customize their description with command line arguments.
Experiment specifications are created with experiment.py
files each located in the experiment repo: benchpark/experiments/${Benchmark1}
.
If you are adding experiments to an existing benchmark, it is best to extend the current experiment.py for that benchmark in the experiment repo.
If you are adding experiments to a benchmark you created, create a new folder for your benchmark in the experiment repo, and put your new experiment.py inside of it.
These experiment.py
files inherit from the Experiment base class in /lib/benchpark/experiment.py
shown below, and when used in conjunction with the system configuration files
and package/application repositories, are used to generate a set of concrete Ramble experiments for the target system and programming model.
# Copyright 2023 Lawrence Livermore National Security, LLC and other
# Benchpark Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: Apache-2.0
from typing import Dict
import yaml # TODO: some way to ensure yaml available
from benchpark.error import BenchparkError
from benchpark.directives import ExperimentSystemBase
from benchpark.directives import variant
import benchpark.spec
import benchpark.paths
import benchpark.repo
import benchpark.runtime
import benchpark.variant
bootstrapper = benchpark.runtime.RuntimeResources(benchpark.paths.benchpark_home)
bootstrapper.bootstrap()
import ramble.language.language_base # noqa
import ramble.language.language_helpers # noqa
class ExperimentHelper:
def __init__(self, exp):
self.spec = exp.spec
self.variables = {}
self.env_vars = {
"set": {},
"append": [{"paths": {}, "vars": {}}],
}
def compute_include_section(self):
return []
def compute_config_section(self):
return {}
def compute_modifiers_section(self):
return []
def compute_applications_section(self):
return {}
def compute_package_section(self):
return {}
def get_helper_name_prefix(self):
return None
def get_spack_variants(self):
return None
def compute_variables_section(self):
return {}
def set_environment_variable(self, name, value):
"""Set value of environment variable"""
self.env_vars["set"][name] = value
def append_environment_variable(self, name, value, target="paths"):
"""Append to existing environment variable PATH ('paths') or other variable ('vars')
Matches expected ramble format. Example:
https://ramble.readthedocs.io/en/latest/workspace_config.html#environment-variable-control
"""
self.env_vars["append"][0][target][name] = value
def compute_config_variables(self):
pass
def compute_config_variables_wrapper(self):
self.compute_config_variables()
return self.variables, self.env_vars
class SingleNode:
variant(
"single_node",
default=True,
description="Single node execution mode",
)
class Helper(ExperimentHelper):
def get_helper_name_prefix(self):
return "single_node" if self.spec.satisfies("+single_node") else ""
class Experiment(ExperimentSystemBase, SingleNode):
"""This is the superclass for all benchpark experiments.
***The Experiment class***
Experiments are written in pure Python.
There are two main parts of a Benchpark experiment:
1. **The experiment class**. Classes contain ``directives``, which are
special functions, that add metadata (variants) to packages (see
``directives.py``).
2. **Experiment instances**. Once instantiated, an experiment is
essentially a collection of files defining an experiment in a
Ramble workspace.
"""
#
# These are default values for instance variables.
#
# This allows analysis tools to correctly interpret the class attributes.
variants: Dict[
"benchpark.spec.Spec",
Dict[str, benchpark.variant.Variant],
]
variant(
"package_manager",
default="spack",
values=("spack", "environment-modules"),
description="package manager to use",
)
variant(
"append_path",
default=" ",
description="Append to environment PATH during experiment execution",
)
def __init__(self, spec):
self.spec: "benchpark.spec.ConcreteExperimentSpec" = spec
super().__init__()
self.helpers = []
self._spack_name = None
self._ramble_name = None
for cls in self.__class__.mro()[1:]:
if cls is not Experiment and cls is not object:
if hasattr(cls, "Helper"):
helper_instance = cls.Helper(self)
self.helpers.append(helper_instance)
self.name = self.spec.name
if "workload" in self.spec.variants:
self.workload = self.spec.variants["workload"]
else:
raise BenchparkError(f"No workload variant defined for package {self.name}")
self.package_specs = {}
@property
def spack_name(self):
"""The name of the spack package that is used to build this benchmark"""
return self._spack_name
@spack_name.setter
def spack_name(self, value: str):
self._spack_name = value
@property
def ramble_name(self):
"""The name of the ramble application associated with this benchmark"""
return self._ramble_name
@ramble_name.setter
def ramble_name(self, value: str):
self._ramble_name = value
def compute_include_section(self):
# include the config directory
return ["./configs"]
def compute_config_section(self):
# default configs for all experiments
default_config = {
"deprecated": True,
}
if self.spec.variants["package_manager"][0] == "spack":
default_config["spack_flags"] = {
"install": "--add --keep-stage",
"concretize": "-U -f",
}
return default_config
def compute_modifiers_section(self):
return []
def compute_modifiers_section_wrapper(self):
# by default we use the allocation modifier and no others
modifier_list = [{"name": "allocation"}, {"name": "exit-code"}]
modifier_list += self.compute_modifiers_section()
for cls in self.helpers:
modifier_list += cls.compute_modifiers_section()
return modifier_list
def add_experiment_name_prefix(self, prefix):
self.expr_name = [prefix] + self.expr_name
def add_experiment_variable(self, name, values, use_in_expr_name=False):
self.variables[name] = values
if use_in_expr_name:
self.expr_name.append(f"{{{name}}}")
def set_environment_variable(self, name, values):
"""Set value of environment variable"""
self.env_vars["set"][name] = values
def append_environment_variable(self, name, values, target="paths"):
"""Append to existing environment variable PATH ('paths') or other variable ('vars')
Matches expected ramble format. Example:
https://ramble.readthedocs.io/en/latest/workspace_config.html#environment-variable-control
"""
if target not in ["paths", "vars"]:
raise ValueError("Invalid target specified. Must be 'paths' or 'vars'.")
self.env_vars["append"][0][target][name] = values
def zip_experiment_variables(self, name, variable_names):
self.zips[name] = list(variable_names)
def matrix_experiment_variables(self, variable_names):
if isinstance(variable_names, str):
self.matrix.append(variable_names)
elif isinstance(variable_names, list):
self.matrix.extend(variable_names)
else:
raise ValueError("Variable list must be of type str or list[str].")
def add_experiment_exclude(self, exclude_clause):
self.excludes.append(exclude_clause)
def compute_applications_section(self):
raise NotImplementedError(
"Each experiment must implement compute_applications_section"
)
def compute_applications_section_wrapper(self):
self.expr_name = []
self.env_vars = {
"set": {},
"append": [{"paths": {}, "vars": {}}],
}
self.variables = {}
self.zips = {}
self.matrix = []
self.excludes = []
for cls in self.helpers:
variables, env_vars = cls.compute_config_variables_wrapper()
self.variables |= variables
self.env_vars["set"] |= env_vars["set"]
self.env_vars["append"][0] |= env_vars["append"][0]
self.compute_applications_section()
expr_helper_list = []
for cls in self.helpers:
helper_prefix = cls.get_helper_name_prefix()
if helper_prefix:
expr_helper_list.append(helper_prefix)
expr_name_suffix = "_".join(expr_helper_list + self.expr_name)
expr_setup = {
"variants": {"package_manager": self.spec.variants["package_manager"][0]},
"env_vars": self.env_vars,
"variables": self.variables,
"zips": self.zips,
"matrix": self.matrix,
"exclude": ({"where": self.excludes} if self.excludes else {}),
}
workloads = {}
for workload in self.workload:
expr_name = f"{self.name}_{workload}_{expr_name_suffix}"
workloads[workload] = {
"experiments": {
expr_name: expr_setup,
}
}
return {
self.name: {
"workloads": workloads,
}
}
def add_package_spec(self, package_name, spec=None):
if spec:
self.package_specs[package_name] = {
"pkg_spec": spec[0],
}
else:
self.package_specs[package_name] = {}
def compute_package_section(self):
raise NotImplementedError(
"Each experiment must implement compute_package_section"
)
def compute_package_section_wrapper(self):
pkg_manager = self.spec.variants["package_manager"][0]
for cls in self.helpers:
cls_package_specs = cls.compute_package_section()
if cls_package_specs and "packages" in cls_package_specs:
self.package_specs |= cls_package_specs["packages"]
self.compute_package_section()
if self.name not in self.package_specs:
raise BenchparkError(
f"Package section must be defined for application package {self.name}"
)
if pkg_manager == "spack":
spack_variants = list(
filter(
lambda v: v is not None,
(cls.get_spack_variants() for cls in self.helpers),
)
)
self.package_specs[self.name]["pkg_spec"] += " ".join(
spack_variants
).strip()
elif pkg_manager == "environment-modules":
if "append_path" in self.spec.variants:
self.append_environment_variable(
"PATH", self.spec.variants["append_path"][0]
)
return {
"packages": {k: v for k, v in self.package_specs.items() if v},
"environments": {self.name: {"packages": list(self.package_specs.keys())}},
}
def compute_variables_section(self):
return {}
def compute_variables_section_wrapper(self):
# For each helper class compute any additional variables
additional_vars = {}
for cls in self.helpers:
additional_vars.update(cls.compute_variables_section())
return additional_vars
def compute_ramble_dict(self):
# This can be overridden by any subclass that needs more flexibility
ramble_dict = {
"ramble": {
"include": self.compute_include_section(),
"config": self.compute_config_section(),
"modifiers": self.compute_modifiers_section_wrapper(),
"applications": self.compute_applications_section_wrapper(),
"software": self.compute_package_section_wrapper(),
}
}
# Add any variables from helper classes if necessary
additional_vars = self.compute_variables_section_wrapper()
if additional_vars:
ramble_dict["ramble"].update({"variables": additional_vars})
return ramble_dict
def write_ramble_dict(self, filepath):
ramble_dict = self.compute_ramble_dict()
with open(filepath, "w") as f:
yaml.dump(ramble_dict, f)
Some or all of the functions in the Experiment base class can be overridden to define custom behavior, such as adding experiment variants.
compute_package_section
In compute_package_section
add the benchmark’s package spec. Required packages for the benchmark should be defined in the package.py
.
ADDITIONAL_SPECS
should be specifications that the exeperiment always uses, such as +mpi
, e.g. amg2023@{app_version} +mpi
.
def compute_package_section(self):
app_version = self.spec.variants["version"][0]
self.add_package_spec(self.name, [f"BENCHMARK@{app_version} [ADDITIONAL_SPECS]"])
Variants
Variants of the experiment can be added to utilize different ProgrammingModels used for on-node parallelization,
e.g., benchpark/experiments/amg2023/experiment.py
can be updated to inherit from different experiments to , which can be
set to cuda
for an experiment using CUDA (on an NVIDIA GPU),
or openmp
for an experiment using OpenMP (on a CPU).:
class Amg2023(
Experiment,
OpenMPExperiment,
CudaExperiment,
ROCmExperiment,
StrongScaling,
WeakScaling,
ThroughputScaling,
Caliper,
):
Multiple types of experiments can be created using variants as well (e.g., strong scaling, weak scaling). See AMG2023 or Kripke for examples.
Once an experiment class has been written, an experiment is initialized with the following command, with any boolean variants with +/~ or string variants defined in your experiment.py passed in as key-value pairs: ``benchpark experiment init –dest {path/to/dest} {benchmark_name} +/~{boolean variant} {string variant}={value} ``
For example, to run the AMG2023 strong scaling experiment for problem 1, using CUDA the command would be:
benchpark experiment init --dest amg2023_experiment amg2023 +cuda workload=problem1 +strong ~single_node
Initializing an experiment generates the following yaml files:
ramble.yaml
defines the Ramble specs for building, running, analyzing and archiving experiments.execution_template.tpl
serves as a template for the final experiment script that will be concretized and executed.
A detailed description of Ramble configuration files is available at Ramble workspace_config.
For more advanced usage, such as customizing hardware allocation or performance profiling see Benchpark Modifiers.
Validating the Benchmark/Experiment
To manually validate your new experiments work, you should initialize an existing system, and run your experiments. For example if you just created a benchmark baz with OpenMP and strong scaling variants it may look like this::
benchpark system init --dest=genericx86-system genericx86
benchpark experiment init --dest=baz-benchmark baz +openmp +strong ~single_node
benchpark setup ./baz-benchmark ./x86 workspace/
When this is complete you have successfully completed the Setting Up a Benchpark Workspace step and can run and analyze following the Benchpark output or following steps in Building an Experiment in Benchpark.