Built-in profiling configurations¶
Caliper includes built-in configurations for many common performance analysis
tasks. These configurations can be enabled through the ConfigManager API reference
or the CALI_CONFIG
environment variable.
Configuration String Syntax¶
A configuration string for the ConfigManager class or CALI_CONFIG environment variable is a comma-separated list of configs and parameters.
A config is the name of one of Caliper’s built-in measurement configurations, e.g. runtime-report or event-trace. Multiple configs can be specified, separated by comma.
Most configs have optional parameters, e.g. output to name an output file.
Parameters can be specified as a list of key-value pairs in parentheses after the
config name, e.g. runtime-report(output=report.txt,io.bytes=true)
. For boolean
parameters, only the key needs to be added to enable it; for example,
io.bytes is equal to io.bytes=true.
Parameters can also be listed separately in the config string, outside of parentheses.
In that case, the parameter applies to all configs, whereas parameters inside
parentheses only apply to the config where they are listed. For example, in
runtime-report(io.bytes),spot,mem.highwatermark
, the mem.highwatermark option
will be active in both the runtime-report and spot config, whereas io.bytes
will only be active for runtime-report. Configs and parameters can be listed in
any order.
Here is a more complex example:
runtime-report(output=stdout),profile.cuda,mem.highwatermark,event-trace(output=trace.cali,trace.io)
This will print a runtime profile to stdout, including CUDA API calls and memory high-water marks in the profile, and write an event trace with region begin/end and I/O operations into the trace.cali file.
Built-in configs¶
The following list describes the ConfigManager’s built-in configs and their parameters. Note that depending on the Caliper build configuration, not all configs or options may be available in a particular Caliper installation.
- event-trace
Record a trace of region enter/exit events in .cali format. Options:
- event.timestamps
Record event timestamps
- output
Output location (‘stdout’, ‘stderr’, or filename)
- trace.io
Trace I/O events
- trace.mpi
Trace I/O events
- hatchet-region-profile
Record a region time profile for processing with hatchet or cali-query. Options:
- adiak.import_categories
Adiak import categories. Comma-separated list of integers.
- io.bytes
Report I/O bytes written and read
- io.bytes.read
Report I/O bytes read
- io.bytes.written
Report I/O bytes written
- io.read.bandwidth
Report I/O read bandwidth
- io.write.bandwidth
Report I/O write bandwidth
- level
Minimum region level that triggers snapshots
- mem.highwatermark
Record memory high-water mark for regions
- output
Output location (‘stdout’, ‘stderr’, or filename)
- output.format
Output format (‘hatchet’, ‘cali’, ‘json’)
- profile.cuda
Profile CUDA API functions
- profile.mpi
Profile MPI functions
- topdown-counters.all
Raw counter values for Intel top-down analysis (all levels)
- topdown-counters.toplevel
Raw counter values for Intel top-down analysis (top level)
- topdown.all
Top-down analysis for Intel CPUs (all levels)
- topdown.toplevel
Top-down analysis for Intel CPUs (top level)
- use.mpi
Merge results into a single output stream in MPI programs
- hatchet-sample-profile
Record a sampling profile for processing with hatchet. Options:
- adiak.import_categories
Adiak import categories. Comma-separated list of integers.
- source.module
Report source module (.so/.exe) for samples
- source.location
Report source location (file+line) for samples
- source.function
Report source function name for samples
- output
Output location (‘stdout’, ‘stderr’, or filename)
- output.format
Output format (‘hatchet’, ‘cali’, ‘json’)
- callpath
Perform call-stack unwinding
- sample.frequency
Sampling frequency in Hz. Default: 200
- use.mpi
Merge results into a single output stream in MPI programs
- sample-report
Print a sampling profile for the program
- aggregate_across_ranks
Aggregate results across MPI ranks
- callpath
Group by function call path instead of instrumented region
- max_column_width
Maximum column width in the tree display
- output
Output location (‘stdout’, ‘stderr’, or filename)
- print.metadata
Print program metadata (Caliper globals and Adiak data)
- sample.frequency
Sampling frequency in Hz. Default: 200
- source.function
Report source function
- source.location
Report source location (file+line)
- source.module
Report source module (.so/.exe)
- loop-report
Print summary and time-series information for loops. Options:
- aggregate_across_ranks
Aggregate results across MPI ranks
- io.bytes
Report I/O bytes written and read
- io.bytes.read
Report I/O bytes read
- io.bytes.written
Report I/O bytes written
- io.read.bandwidth
Report I/O read bandwidth
- io.write.bandwidth
Report I/O write bandwidth
- iteration_interval
Measure every N loop iterations
- mem.highwatermark
Record memory high-water mark for regions
- output
Output location (‘stdout’, ‘stderr’, or filename)
- summary
Print loop summary
- target_loops
List of loops to target. Default: any top-level loop.
- time_interval
Measure after t seconds
- timeseries
Print time series
- timeseries.maxrows
Max number of rows in timeseries display. Set to 0 to show all. Default: 20.
- topdown.all
Top-down analysis for Intel CPUs (all levels)
- topdown.toplevel
Top-down analysis for Intel CPUs (top level)
- mpi-report
Print time spent in MPI functions
- runtime-report
Print a time profile for annotated regions. Options:
- aggregate_across_ranks
Aggregate results across MPI ranks
- calc.inclusive
Report inclusive instead of exclusive times
- io.bytes
Report I/O bytes written and read
- io.bytes.read
Report I/O bytes read
- io.bytes.written
Report I/O bytes written
- io.read.bandwidth
Report I/O read bandwidth
- io.write.bandwidth
Report I/O write bandwidth
- level
Minimum region level that triggers snapshots
- max_column_width
Maximum column width in the tree display
- mem.highwatermark
Record memory high-water mark for regions
- mpi.message.count
Number of MPI send/recv/collective operations
- mpi.message.size
MPI message size
- output
Output location (‘stdout’, ‘stderr’, or filename)
- profile.cuda
Profile CUDA API functions
- profile.hip
Profile HIP API functions
- profile.kokkos
Profile Kokkos functions
- profile.mpi
Profile MPI functions
- topdown.all
Top-down analysis for Intel CPUs (all levels)
- topdown.toplevel
Top-down analysis for Intel CPUs (top level)
- spot
Record a time profile for the Spot web visualization framework. Options:
- adiak.import_categories
Adiak import categories. Comma-separated list of integers.
- aggregate_across_ranks
Aggregate results across MPI ranks
- io.bytes
Report I/O bytes written and read
- io.bytes.read
Report I/O bytes read
- io.bytes.written
Report I/O bytes written
- io.read.bandwidth
Report I/O read bandwidth
- io.write.bandwidth
Report I/O write bandwidth
- level
Minimum region level that triggers snapshots
- mem.highwatermark
Record memory high-water mark for regions
- output
Output location (‘stdout’, ‘stderr’, or filename)
- profile.cuda
Profile CUDA API functions
- profile.mpi
Profile MPI functions
- topdown-counters.all
Raw counter values for Intel top-down analysis (all levels)
- topdown-counters.toplevel
Raw counter values for Intel top-down analysis (top level)
- topdown.all
Top-down analysis for Intel CPUs (all levels)
- topdown.toplevel
Top-down analysis for Intel CPUs (top level)