CI Developer Guide
This guide is intended for people who want to modify the GitHub/Gitlab CI for benchpark
GitLab
The Benchpark GitLab tests run on LC systems as a part of the https://lc.llnl.gov/gitlab
.
The goal is to build and run the benchmarks on systems with different programming models, as well as test the functionality of the benchpark library.
GitLab configuration files are located under the .gitlab
folder and specified by the .gitlab-ci.yml
configuration file:
.gitlab-ci.yml
.gitlab/
bin/
tests/
utils/
.gitlab-ci.yml
defines project-wide variables, the job stage pipeline, and different sets of tests (e.g. nightly, daily). This file also includes some pre-defined utility functions from the LLNL/radiuss-shared-ci project..gitlab/bin/
stores “binaries” that are used during CI execution.

Fig. 1: Running experiments using a “non-shared” strategy (A) versus a shared allocation strategy (B).
.gitlab/tests/
Define the different types of tests. Add a benchmark to a given test by adding to theBENCHMARK
list variable for the appropriateHOST
andVARIANT
. Add tests for a system by defining a new group to the list defined under the appropriateparallel:matrix:
. TheHOST
must have existing runners on thehttps://lc.llnl.gov/gitlab
(managed by admins) in order to actually execute. All available LC instance runners can be found here.Nightly tests (
nightly.yml
) defines all of the tests that run nightly. All of the tests for all experiments and systems are defined in this file. Tests are ran sequentially on a given system, but are parallelized across the systems (non-shared Figure 1A). The main goal for the nightly tests is to test all available programming models for as many of the benchmarks in benchpark as possible, which we post the successes/failures on the develop branch to our CDash dashboard. An experiment failing on the dashboard should indicate that this experiment should also fail if you try building and running yourself.Daily tests are split into multiple categories
Non-shared tests (non-shared Figure 1A)
non_shared.yml
operate the same way as the nightly tests and run sequentially.Shared Flux tests (shared Figure 1B)
shared_flux_clusters.yml
has all of the tests that we execute on clusters running system-wide flux. The testing strategy allocates a single node and then submits all of the tests to that single node. This strategy avoids the time spent waiting for an allocation between tests.Shared Slurm tests (shared Figure 1B)
shared_slurm_clusters.yml
has all of the tests that we execute on clusters running system-wide flux. The strategy for these clusters is similar to the flux clusters, but first involves starting flux on the allocated node, which is necessary since testing the benchpark workflow involves submitting a job within a job step in this case, which is not possible using slurm.
.gitlab/utils/
contains various utility functions for:Checking machine status
machine_checks.yml
Cancelling jobs
cancel-flux.sh
andcancel-slurm.sh
Defining common rules
rules.yml
A reproducible script for executing an experiment in benchpark
run-experiment.sh
Reporting GitLab status to GitHub PRs
status.yml
.
GitHub
Although the GitLab tests cover the most critical step (building and running the benchmarks across multiple LC systems) they do not test all of the benchmarks/systems or library functionality in benchpark. The GitHub tests use the GitHub virtual machine runners to test mostly python functionality, which is platform independent. The following is a description of the different types of GitHub testing:
run.yml
Dryruns
Dryruns can be thought of similarly to the GitLab tests, but only involve testing up to the ramble workspace setup
step with the --dry-run
flag, which sets up the ramble workspace, but does not build the benchmark. The dryruns are automatically enumerated using the output from the benchpark list
command, which is used to list experiments for all programming models and scaling options, as well as enumerating the modifiers. For each programming model that a benchmark implements, a dryrun will be executed on every system in benchpark that contains that programming model in self.programming_models
.
.github/utils/dryruns.py
contains the main script that enumerates all of the dryruns cases, and executes them in asubprocess
call..github/utils/dryrun.sh
executes a single dryrun, provided abenchmark_spec
andsystem_spec
..github/workflows/run.yml
defines thedryrunexperiments
job, that will be executed by a GitHub runner. Runs are separated by programming model, scaling type, and modifiers, and are executed in parallel.
Dryruns are mainly for verifying that a given experiment/system is able to be initialized based on the programming models and scaling types have been included in the experiment class. Simple errors such as syntax errors will be caught by the linter instead. While much of the testing covered by dryruns is likely redundant, they are relatively inexpensive to run.
Saxpy
There is a singular job that builds and runs the saxpy
experiment on a GitHub virtual machine runner. This step additionally tests ramble workspace analyze & archive, uploading the binary as a CI cache, and the benchpark functionality to run a pre-built binary (reusing the spack-built binary).
Pytest
The Pytest unit tests are designed to cover as many different cases of the benchpark library as possible, useful for checking Python object properties that cannot be checked from the command line. Additionally, we can easily check that certain errors are raised under specific conditions to ensure our error checking is working properly. Notice that our Pytest coverage is not comprehensive on its own, since we have other testing, i.e. GitLab and GitHub (dryruns), that covers many cases.
style.yml - Lint
The linter step checks:
Check Python code formatting using
black
Check spelling using
codespell
Sort imports using
isort
flake8
for checking Python style enforcementyamlfix
for formatting.yaml
/.yml
files
CDash
The successes/failures of our GitLab tests are posted to our CDash dashboard CDash dashboard. There is a dashboard for the nightly tests on the develop branch, and several dashboards for each system for daily PRs.
The following files are related to CDash:
CTestGitlab.cmake
configures CTest variables, the dashboard names and runs the tests and submits the results.CTestConfig.cmake
sets the cdash token and configuration variables.CMakeLists.txt
enables CTest and adds the gitlab test..gitlab/utils/status.yml
Contains the logic to run CTest after a test completes and upload the status to the Benchpark CDash dashboard.