Caliper services

Caliper comes with a number of optional modules (services) that provide measurement data or processing and data recording capabilities. The flexible combination and configuration of these services allows you to quickly assemble recording solutions for a wide range of usage scenarios.

You can enable the services required for your measurement with the CALI_SERVICES_ENABLE configuration variable, e.g.:

export CALI_SERVICES_ENABLE=event,recorder,trace

to create event-triggered context traces for an application.

You can use cali-query –help=services to list all available services, and cali-query –help <service name> to list the options for a given service:

$ cali-query --help event
event service:
 Trigger snapshots for Caliper region begin and end events
  CALI_EVENT_TRIGGER (stringlist)
   List of attributes that trigger measurements (optional)
   If true, add begin/end attributes at each event. Increases overhead.
   Region filter to specify regions that will trigger snapshots
   Region filter to specify regions that won't trigger snapshots

The following sections describe the available service modules and their configuration.


The aggregate service accumulates aggregation attributes (e.g., time durations) of snapshots with a similar key, creating a profile.


Colon-separated list of attributes that form the aggregation key (see below).

Default: Empty (all attributes without the ASVALUE storage property are key attributes).

Aggregation key

The aggregation key defines for which attributes separate (aggregate) snapshot records will be kept. That is, the aggregate service will generate an aggregate snapshot record for each unique combination of key values found in the input snapshots. The values of the aggregation attributes in the input snapshots will be accumulated and appended to the aggregate snapshot. The aggregate snapshot records also include a count attribute that indicates how many input snapshots with the given aggregation key were found. Attributes that are neither aggregation attributes nor part of the aggregation key will not appear in the aggregate snapshot records.

As an example, consider the following program:

#include <caliper/Annotation.h>

void foo(int c) {
      g( cali::Annotation("function").begin("foo") );

    // ...

int main()
  { // "A" loop
      g( cali::Annotation("").begin("A") );

    for (int i = 0; i < 3; ++i) {


  { // "B" loop
      g( cali::Annotation("").begin("B") );

    for (int i = 0; i < 4; ++i) {


Assuming snapshots are generated from the function annotation and the aggregation key contains the function,, and iteration attributes, the aggregate service will generate the following aggregate snapshots :  iteration=0  function=foo  count=2  iteration=1  function=foo  count=2  iteration=2  function=foo  count=2  iteration=0  function=foo  count=1  iteration=1  function=foo  count=1  iteration=2  function=foo  count=1  iteration=3  function=foo  count=1

Removing the iteration attribute from the aggregation key will collapse input snapshots with different iteration values into a single aggregate snapshot:  function=foo  count=6  function=foo  count=4

Aggregation attributes

The aggregate service accumulates values of aggregation attributes in input snapshots with similar aggregation keys. Specifically, it reports the minimum and maximum value, and computes the sum of the aggregation attributes in the input snapshots. Aggregate snapshots include (min|max|sum)#attribute-name attributes with the minimum, maximum, and sum values for each aggregation attribute, respectively.

Note that only attributes with the ASVALUE property can be aggregation attributes.


The following configuration generates a time profile for the function annotation separated by loop id. Note: when using time.inclusive.duration as aggregation attribute, we strongly recommend to include the event.end event attributes for all annotations of interest in the aggregation key (e.g., event.end#function in the example), or use the default, empty key.

$ CALI_SERVICES_ENABLE=aggregate,event,recorder,timestamp \
    CALI_EVENT_TRIGGER=function \
    CALI_AGGREGATE_KEY=event.end#function, \
== CALIPER: Registered aggregation service
== CALIPER: Registered event service
== CALIPER: Registered recorder service
== CALIPER: Registered timestamp service
== CALIPER: Initialized
== CALIPER: aggregate: flushed 4 snapshots.
== CALIPER: Wrote 57 records.

The resulting file has the following contents:  event.end#function=foo  count=6
  sum#time.inclusice.duration=151  event.end#function=foo  count=4


The alloc service adds data tracking information to Caliper. It records snapshots of allocation calls with their arguments and return values, and resolves the containing allocations of any memory addresses produced by other Caliper services, such as the libpfm service. By default, it will only use data tracking information provided via the Caliper data tracking API, but in conjunction with the sysalloc service it records and/or tracks any allocations by hooking system allocation calls. This service may potentially incur significant amounts of overhead when recording/tracking frequent allocations/deallocations.


Records snapshots when memory regions are being tracked or untracked, storing the given label in the mem.alloc or attribute, respectively. The snapshots also contain a unique ID for the allocation in the alloc.uid attribute, and the size of the allocated region in the alloc.total_size attribute.

Default: true


When set, snapshots with memory addresses produced by other services (e.g., Libpfm) will be appended with the allocations that contain them. The snapshots then contain alloc.label#address_attribute, alloc.uid#address_attribute, and alloc.index#address_attribute attributes with the memory region label, allocation ID, and array index for the memory address attributes found in the snapshot.

Default: false


Records the amount of active allocated memory, in bytes, at each snapshot, in the attribute.

Default: false


The callpath service uses libunwind to add callpaths to Caliper context snapshots. By default, the callpath service provides call-stack addresses. Set CALI_CALLPATH_USE_NAMES=true to retrieve function names on-line. Call-path addresses are provided in the callpath.address attribute, call-path region names in callpath.regname. Example:

$ export CALI_SERVICES_ENABLE=callpath:event:recorder:trace
$ ./test/cali-basic
$ cali-query -e --print-attributes=callpath.address:callpath.regname
$ callpath.address=401207/2aaaac052d5d/400fd9,callpath.regname=main/__libc_start_main/_start

The example shows the callpath.address and callpath.regname attributes in Caliper context records.


Provide region names for call paths. Incurs higher overhead. Note that region names for C++ and Fortran functions are not demangled.

Default: false.


Record region addresses for call paths.

Default: true.


Skip a number of stack frames. This avoids recording stack frames within the Caliper library.

Default: 10


The cupti service records CUDA events and wraps CUDA API calls through the CUpti interface. Specifically, it can intercept runtime API calls, driver API calls, resource creation and destruction events (contexts and streams), and synchronization events. It can also interpret NVTX source-code annotations as Caliper annotations.


String. A comma-separated list of CUpti callback domains to intercept. Values:

  • runtime: The CUDA runtime API, e.g. cudaDeviceSynchronize.

  • driver: The CUDA driver API, e.g. cuInit. This category tends to have significant overheads.

  • resource: Stream and context creation.

  • sync: Synchronization events.

  • nvtx: Interpret NVTX annotations as Caliper annotations.

  • none: Don’t capture callbacks.

Default: runtime,sync


Boolean. Record the kernel symbol name for callbacks (typically when launching kernels). Default: true.


Boolean. Record CUDA context ID. Default: true.

CUpti Attributes

The cupti service adds the following attributes:


Name of CUDA runtime API call. Nested.


Name of CUDA driver API call. Nested.


Resource being created or destroyed. (create_context, destroy_context, create_stream, destroy_stream).


Object being synchronized (context or stream)


Name of NVTX range annotation.


Symbol name of a kernel being launched.


CUDA context ID. Recorded with synchronization and resource events.


CUDA device ID. Recorded with resource and sync events.


CUDA Stream ID. Recorded with stream resource sync events.

CUpti event sampling (EXPERIMENTAL)

The CUpti service can read hardware “events” from CUDA devices at every snapshot using CUpti’s continuous event collection. However, note that this capability is only available on Tesla devices, and it also tends to have very high overhead.

To activate it, provide the event ID for the event to read in CALI_CUPTI_SAMPLE_EVENT_ID. The possible event IDs can be obtained with the cupti_query program that is provided as a sample with the CUDA/CUpti toolkit. Values will be stored in the cupti.event.EVENT_NAME attribute.

As an example, consider sampling instructions executed on the device. From the cupti_query output, we learn that the event ID for this event is 83886156

Event# 13
Id        = 83886156
Name      = inst_executed
Shortdesc = inst_executed
Longdesc  = Number of instructions executed per warp.

We can now add instructions executed to Caliper snapshots. The following configuration can do that:



cali-query -q "select sum(time.duration),sum(cupti.event.inst_executed) group by function,cupti.runtimeAPI format tree"

Path                                     time.duration cupti.event.inst_executed
main                                             42123                         0
  LagrangeLeapFrog                              192340                         0
    CalcTimeConstraintsForElems                  13768                         0
      cudaStreamSynchronize                      65776                         0
      cudaPeekAtLastError                        66714                         0
      cudaLaunch                                 76372                    214944
      cudaSetupArgument                         190416                         0
      cudaConfigureCall                          60859                         0
    CalcEnergyForElems                          695449                         0
      cudaStreamSynchronize                     532691                         0
      cudaPeekAtLastError                      1138736                         0
      cudaLaunch                               1223774                   1766400
      cudaSetupArgument                        1510537                         0
      cudaConfigureCall                         310107                         0


The cuptitrace service records an asynchronous stream of CUDA activities, such as memory copies or kernel executions. CUDA activity records contain the kind of activity, its start and end time, and additional attributes listed below.

Note that the timestamps in CUDA activity records are generated by CUpti, not by Caliper’s timestamp service. The CALI_CUPTITRACE_SNAPSHOT_TIMESTAMPS option collects CUpti timestamps for all Caliper snapshots, allowing one to compare timestamps between the CUDA activity records and host-side Caliper events. With CALI_CUPTITRACE_SNAPSHOT_DURATION, cuptitrace will also calculate the time duration of host-side events based on CUpti timestamps.


The CUpti activity kinds to record. Comma-separated list. Possible values:

device: Device info correlation: Correlation records driver: Driver API runtime: Runtime API kernel: CUDA Kernels being executed memcpy: CUDA memory copies uvm: Unified memory events

Default: correlation,device,kernel,memcpy,runtime


Add the Caliper context (annotations) from the point where a CUDA activity was launched to the CUDA activity records. Boolean.

Default: true.


Add CUpti timestamps to all Caliper snapshot records. Allows one to compare timestamps from CUDA activity records with host-side Caliper events. Boolean.

Default: false


Calculate duration of host-side events using CUpti timestamps. Useful to compare duration of GPU and Host activities. Boolean.

Default: false


If uvm memory events are enabled, record host-to-device and device-to-host transfers.

Default: true


If uvm memory events are enabled, record CPU and GPU pagefaults.

Default: true

CUptiTrace records contain the following attributes:


Kind of the CUDA activity. Possible values: “memcpy” or “kernel”.


Start timestamp of the CUDA activity (CUpti timestamp).


End timestamp of the CUDA activity (CUpti timestamp).


Duration of the CUDA activity in nanoseconds.

Duration of a host-side activity in nanoseconds.

For kernels, the function name of the kernel.


For memory copies, the direction of the memory transfer (device-to-host [DtoH], host-to-device [HtoD], etc.)


CUpti timestamp at program start.


CUpti timestamp attached to host-side Caliper snapshot records.


A globally unique ID of the CUDA device the activity is running on.


A globally unique ID of the CUDA device the activity is running on.


The kind of unified memory event.


Pagefault address for unified memory events.


Bytes transferred in a unified memory event.


Cause of a unified memory page migration.


Access type that caused a unified memory page fault.

Environment Information

The environment information (env) service collects runtime environment information at process startup and adds it to the Caliper context.

Specifically, it collects

  • The process’ command line (program name and arguments)

  • Machine type and hostname, and operating system type, release, and version

  • Date and time of program start in text form

Moreover, the environment information service can put any environment variable defined at program start on the Caliper blackboard.


List of extra environment variables to import.

Default: empty


The event trigger service triggers snapshots when attributes are updated. It is possible to select a list of snapshot-triggering attributes, or have any attribute update trigger snapshots. Updates of attributes with the CALI_ATTR_SKIP_EVENTS property will never trigger snapshots.

If CALI_EVENT_ENABLE_SNAPSHOT_INFO is enabled, snapshots triggered by the event service include an attribute which describes the event that triggered the snapshot, in the following form:

event.<begin|set|end>#<attribute name>=<value>

For example, a snapshot triggered by the call cali_set_int_byname("my.iteration", 42); includes the attribute event.set#my.iteration=42 to describe the triggering event. Example:

$ export CALI_SERVICES_ENABLE=event,recorder,trace
$ ./test/cali-basic
$ cali-query -e 150819-113409_40027_W5Z0mWvoJUyn.cali

By setting CALI_EVENT_TRIGGER, we can configure the example to only trigger snapshot for “iteration” attribute updates:

$ export CALI_SERVICES_ENABLE=event,recorder,trace
$ export CALI_EVENT_TRIGGER=iteration
$ ./test/cali-basic
$ cali-query -e 150819-113409_40027_W5Z0mWvoJUyn.cali

Value filtering

With filtering, the event service only triggers snapshots for specific values or patterns (e.g., region names). You can provide include or exclude filters. There are three pattern types:


Match the exact value


Match the start of the string


Match a regular expression

You can specify multiple patterns and combine them as needed, e.g. to include only “important_region” as well as any region starting with MPI_ or mylib_:

$ CALI_EVENT_INCLUDE_REGIONS="match(important_region),startswith(MPI_,mylib_)"

Config variables


List of attributes that trigger measurement snapshots. If empty, all user attributes trigger snapshots.

Default: empty


Boolean. Generate the event.begin#attr etc. attributes for each event snapshot. Turning this off can decrease runtime overheads.

Default: true


Minimum region level that triggers snapshots. Default: 0


Specify a value filter to only trigger snapshots for the provided patterns. See above for the different pattern options.

Default: empty (no filter)


Like above, but defines a value filter to skip snapshots for the provided patterns.

Default: empty (no filter)


Specifies branches by name (using a pattern) to measure.


The debug service prints an event log on the selected Caliper log stream. This is useful to debug source-code annotations. Note that you need to set Caliper’s verbosity level to at least 2 to see the log output.


$ ./test/cali-basic
== CALIPER: Available services: callpath papi debug event pthread recorder timestamp mpi
== CALIPER: Registered debug service
== CALIPER: Initialized
== CALIPER: Event: create_attribute (attr = phase)
== CALIPER: Event: pre_begin (attr = phase)
== CALIPER: Event: pre_begin (attr = phase)
== CALIPER: Event: pre_end (attr = phase)
== CALIPER: Event: pre_begin (attr = phase)
== CALIPER: Event: create_attribute (attr = iteration)
== CALIPER: Event: pre_set (attr = iteration)
== CALIPER: Event: pre_set (attr = iteration)
== CALIPER: Event: pre_set (attr = iteration)
== CALIPER: Event: pre_set (attr = iteration)
== CALIPER: Event: pre_end (attr = iteration)
== CALIPER: Event: pre_end (attr = phase)
== CALIPER: Event: pre_end (attr = phase)
== CALIPER: Event: finish
== CALIPER: Finished


The io service wraps POSIX IO calls (open,close,read,write) and collects information about the amount of bytes read and written through these IO calls, as well as filesystems and mount points read from or written to. IO operations trigger io.region begin and end events. It provides the following attributes:


An I/O region. Either “metadata”, “read” or “write”.


The filesystem type targeted by the IO operation.


The mount point targeted by the IO operation.

Bytes read in the IO operation.


Bytes written in the IO operation.


The libpfm service performs per-thread event-based sampling. The user may configure the event upon which to sample, the values to record for each sample, and the sampling period.


Comma-separated list of events to sample. Event names are resolved through libpfm, and may include software and hardware events (see libpfm’s showevtinfo tool to obtain a list of events available on a particular system).

Default: cycles


Whether to record event samples. If set, will trigger a snapshot containing all sampled attributes listed in CALI_LIBPFM_SAMPLE_ATTRIBUTES after CALI_SAMPLE_PERIOD events have occurred.

Default: true


If set, counter values of all active events will be recorded at every Caliper snapshot.

Default: true


Comma-separated list of attributes to record for each sample.

Available entries are:

ip Instruction pointer id Sample id stream_id Stream id time Timestamp tid Thread id period Current sampling period cpu CPU transaction Type of transaction addr Data address* weight Latency* data_src Encoding for memory resource (L1, L2, DRAM etc.)*

*available only for certain events.

Default: ip,time,tid,cpu


Sampling period for each event (valid when sampling is enabled). When set to a value N, a sample will be recorded after every N number of events has occurred.

For multiple events, this should be a comma-separated list for the periods of respective events.

Default: 20000000


Whether to set (precise) for events that support precise ip. Some events require (precise) to be set, for others it is optional (see output of libpfm’s showevtinfo tool to determine when it is available or required).

For multiple events, this should be a comma-separated list for the precise_ip values of respective events.

May be set to either 0, 1, or 2.

Default: 0


Extra event configuration. Some events require an additional parameter to configure behavior, such as latency threshold (see output of libpfm’s showevtinfo tool to determine when it is available or required).

For multiple events, this should be a comma-separated list for the config1 values of respective events.

Default: 0

The following example shows how to configure PEBS memory access sampling with a latency threshold (available on SandyBridge, IvyBridge, Haswell):



The MPI service records MPI operations and the MPI rank. Use it to keep track of the program execution spent in MPI. You can select the MPI functions to track by setting CALI_MPI_WHITELIST or CALI_MPI_BLACKLIST filters. By default, no MPI functions are instrumented.

MPI function names are stored in the mpi.function attribute, and the MPI rank in the mpi.rank attribute.


Comma-separated list of MPI functions to instrument. Only whitelisted functions will be instrumented.


Comma-separated list of MPI functions that fill be filtered. If a blacklist has been set, all functions except for the ones in the blacklist will be instrumented. If both whitelist and blacklist are set, only whitelisted functions will be instrumented, and the blacklist will be applied to the whitelisted functions.

MPI message tracing (EXPERIMENTAL)

The MPI service can record communication information about point-to-point messages being sent and received, as well as collective communications. When enabled, message tracing will create snapshot records for individual point-to-point messages sent or received and for collective operations a process participates in.


Enable message tracing. Default: false


  • Communication records will only be created for MPI functions that are instrumented (i.e., they must be listed in CALI_MPI_WHITELIST, and must not be listed in CALI_MPI_BLACKLIST).

  • This feature is experimental. Many implementation aspects such as attribute names and the information being recorded can change in future versions.

  • Caliper does not synchronize timestamps between MPI ranks, i.e. timestamps taken on different ranks may not be directly comparable

Message tracing creates three types of records:

  • Point-to-point message sent. Contains message destination, size, tag and communicator info.

  • Point-to-point message received. Contains message source, size, tag, and communicator info.

  • Collective communication. Contains collective type, amount of bytes sent, and communicator info.

Specifically, this information is encoded in the following attributes:

Integer. A unique ID for the MPI call the communication happened in. Can be used to associate the communication with the surrounding begin/end records for the MPI function. For MPI calls that process multiple messages (e.g. MPI_Waitall), the records for all communications completed within the same function call have the same ID.


Integer. Source rank, in the local communicator, of a received message. Indicates a message received event.


Integer. Destination rank, in the local communicator, of a message being sent. Indicates a message sent event.


Integer. Tag of a message sent or received.


Integer. Size of message being sent or received in a point-to-point message, or the amount of data sent in a collective communication.


Integer. The type of a collective communication. Indicates a collective communication event. Possible values: 1: Barrier. 2: N-to-N (e.g., MPI_Allgather). 3: 1-to-N (e.g., MPI_Bcast). 4: N-to-1 (e.g., MPI_Gather).


Integer. Root rank, in the local communicator, of a 1-to-N or N-to-1 collective communication.


Integer. Unique ID for the communicator on which this communication occurs.


Boolean. Present and set to true if the communicator on which this communication occurs is congruent to MPI_COMM_WORLD (This applies MPI_COMM_WORLD itself and any duplicate of MPI_COMM_WORLD). If true, then the local source, destination, or collective root rank in the record is identical to its world rank; otherwise it is not.


Size of the communicator on which this communication occurs.


Lists the world ranks present in the local communicator. Currently encoded as a binary array.

Currently, we record communication information for the following MPI functions:

Function / function group


MPI_Send and MPI_Isend


Message sent


Message received

MPI_Start, MPI_Startall

Message(s) sent

MPI_Sendrecv, MPI_Sendrecv_replace

Message sent, message received

MPI_Wait variants

Message(s) received

MPI_Test variants

Message(s) received (for completed receive requests)


Collective (barrier)

MPI_Bcast, MPI_Scatter, MPI_Scatterv

Collective (1-to-n)

MPI_Reduce variants, MPI_Gather, MPI_Gatherv

Collective (n-to-1)

MPI_Allgather, MPI_Allreduce, MPI_Alltoall, MPI_Allgatherv, MPI_Alltoallv, MPI_Reduce_scatter_block, MPI_Scan, MPI_Exscan

Collective (n-to-n)

We do currently not cover:

  • MPI_Alltoallw

  • Non-blocking and neighborhood collectives

  • I/O

  • One-sided communication

  • Process creation and management

MPI Report

The MPI report service (mpireport) aggregates, formats, and writes collected Caliper records from all ranks in an MPI program into a single global report. By default, the mpireport service prints a tabular, human-readable report of the collected snapshots. Users can provide a query specification in CalQL syntax to define filter, aggregation, and formatting options.

The mpireport service aggregates Caliper data across MPI processes before printing it. This happens on every Caliper flush event. Enabling the mpireport service will trigger a flush during MPI_Finalize.

There are two aggregation steps: the first step aggregates data locally on each MPI rank, the second step aggregates data across MPI ranks. Results from the first (local) step are the input for the second (cross-rank) aggregation step. Use CALI_MPIREPORT_LOCAL_CONFIG to define a local aggregation specification. If none is given, the specification given in CALI_MPIREPORT_CONFIG will be used for both the local and cross-rank aggregation.

The mpi service must be enabled for mpireport to work.


File name of the output file. May be set to stdout or stderr to print to the standard output or error streams, respectively.

Similar to the recorder service, the file name may contain fields which will be substituted by attribute values (see recorder service description).

Default: stdout


An aggregation and formatting specification in CalQL syntax (The CalQL query language). Defines the cross-rank aggregation operation and output formatting.

Default: empty; all attributes in the snapshots will be printed.


An aggregation specification in CalQL syntax for the first (local) aggregation step on each rank.

Default: empty; use the cross-rank aggregation specification also for the local aggregation step.

Example: Measure time in Caliper regions, compute inclusive times locally, then compute the average inclusive time per MPI rank:

CALI_MPIREPORT_LOCAL_CONFIG="select inclusive_sum(sum#time.duration) group by prop:nested"
CALI_MPIREPORT_CONFIG="select avg(inclusive#sum#time.duration) as \"Time (Avg)\" group by prop:nested format tree"


The OMPT service records OpenMP information using the OpenMP tools interface (OMPT). It creates Caliper regions for OpenMP parallel regions, worksharing constructs (e.g., loops), and synchronization regions (e.g., barriers).

To use the ompt service, the OpenMP tools interface must be activated in the OpenMP runtime. The ompt service activates the tools interface automatically if a Caliper channel with the ompt service enabled is created before the OpenMP runtime system is first initialized. If an ompt service instance is only created after the OpenMP runtime system has been initialized, you may have to activate manually OMPT by setting the CALI_USE_OMPT environment variable to “1” or “true”.

The ompt service provides the following attributes:


An OpenMP parallel region. The value is the number of threads inside the parallel region. Only set on the thread that created the parallel region.

An OpenMP worksharing region (“loop” etc.)


An OpenMP synchronization region (e.g., “barrier”)


Number of threads in the current parallel region

Arbitrary ID for the current thread

The OpenMP thread ID of the current thread in the innermost parallel region


The kind of OpenMP thread (“initial” or “worker”)


The PAPI service collects hardware counter information through the PAPI library.


The PAPI counters to read as comma-separated list. Available counters can be found with the papi_avail command provided by PAPI. If successfull, snapshots will contain attributes named papi.COUNTER_NAME. Their value represents the increase of that counter since the previous snapshot.


$ CALI_SERVICES_ENABLE=event,papi,trace,report
$ ./test/cali-basic
papi.PAPI_TOT_CYC papi.PAPI_L1_DCM function annotation loop     iteration#mainloop
            36146              431
            28328              372 main
            20601              311 main     init
            37010              546 main
             7147              150 main                mainloop
             8425              115 main                mainloop                  0


The Pthread service wraps pthread_create using GOTCHA, and adds a attribute with a numeric thread ID for the new child thread. In doing so, the pthread service automatically creates a Caliper thread scope on the child thread: this is useful to automatically start sampling (e.g. with the sampler service) on each new thread.


The recorder service writes Caliper snapshot records into a file using a custom text-based I/O format. These files can be read with the cali-query tool.

Writing occurs during a flush phase, which prompts snapshot-buffering services (in particular, the trace or aggregate services) to push out buffered snapshot records for writing. A flush phase can take several seconds and significantly disrupt program performance. By default, Caliper initiates a flush at the end of the program execution.

You can also set the directory and filename that should be used; by default, the recorder service will auto-generate a file name.

CALI_RECORDER_FILENAME=(stdout|stderr|filename pattern)

File name of the output file. May be set to stdout or stderr to print to the standard output or error streams, respectively.

The file name string can contain fields, denoted by %attribute_name%, which will be replaced with attribute values. For example, in an MPI program with the mpi service enabled, the string caliper-%mpi.rank%.cali will create files caliper-0.cali, caliper-1.cali, etc. for each mpi rank. For this to work, the attributes named in the fields need to be set on the blackboard during the flush phase.

Default: not set, auto-generates a unique file name.


Directory to write context trace files to. The directory must exist, Caliper does not create it. Default: not set, use current working directory.


The report service aggregates, formats, and writes collected Caliper records into files or stdout on Caliper flush events (typically, at program end). By default, the report service prints a tabular, human-readable report of the collected snapshots. Users can provide a query specification in CalQL syntax to define filter, aggregation, and formatting options.


File name of the output file. May be set to stdout or stderr to print to the standard output or error streams, respectively.

Similar to the recorder service, the file name may contain fields which will be substituted by attribute values (see recorder service description), for example to create individual report-0.txt, report-1.txt etc. files for each rank in a multi-process program.

Default: stdout


A formatting specification in CalQL syntax (The CalQL query language).

Default: empty; all attributes in the snapshots will be printed.

Example: Consider the following report configuration and list of flushed snapshots:

CALI_REPORT_CONFIG="SELECT function,time.duration WHERE phase=loop ORDER BY time.duration FORMAT table"


This configuration will create the following report output:

function time.duration
bar                 12
foo                100
foo               2000

Only snapshots where phase=loop are selected (due to the filter configuration), and the function and time.duration attributes are printed, in ascending order of time.duration.


The roctracer service records AMD ROCm/HIP activities and runtime API calls.

The roctracer activity records contain the following attributes:


Activity type (e.g., KernelExecution, CopyHostToDevice, etc.)


Activity start time (ns)


Activity end time (ns)


Activity duration (ns)


Bytes copied (for memory copies)


Queue ID


Device ID

Kernel function name

Duration of host activities (if CALI_ROCTRACER_SNAPSHOT_DURATION is on)



Boolean. Enable activity tracing.

Default: true


Boolean. Record kernel function names in activity records. Increases overheads.

Default: false


Boolean. Record duration of CPU activities (e.g., instead of using the timestamp service).

Default: false


Boolean. Record timestamps of CPU activities (e.g., instead of using the timestamp service).

Default: false


The RocTX service forwards Caliper annotations as ROCm RocTX ranges for AMD’s rocprof tool.


The sampler service implements time-based sampling. It triggers snapshots at regular intervals. Sampling allows for low-overhead performance data collection, and can provide insights into code regions that are not or only sparsely covered with source-code annotations.

Caliper must be initialized on each thread that should be sampled. This can be done explicitly via the annotation API, or via the pthread service for child threads.


Sampling frequency in Hz. Default: 10

When active, the sampler service regularly triggers snapshots with the specified frequency. Each snapshot triggered by the sampler service contains a cali.sampler.pc attribute with the program address where the target program was interrupted. The symbollookup service can use this to retrieve function name as well as source file and line information.

The following example generates a sampling trace at 100Hz, uses the symbollookup service to retrieve function name information, and prints a flat profile of the number of samples per function:

CALI_REPORT_CONFIG="SELECT source.function#cali.sampler.pc,count() GROUP BY source.function#cali.sampler.pc FORMAT table ORDER BY count DESC"


The symbollookup service provides function name, source file, and source line number lookup for binary program addresses from, e.g., stack unwinding or program counter sampling. The symbol lookup takes place during snapshot buffer flushes. It appends symbol attributes for each address attribute to the snapshots being flushed. For an address attribute address, the function and file/line number will be added in the source.function#address and sourceloc#address attributes, respectively. If a symbol lookup was unsuccessful for any reason, the value is set to UNKNOWN.


Explicitly select address attributes for which to perform symbol lookups. Colon-separated list. Default: empty, selects address attributes automatically via class.symboladdress attribute class.


Perform function name lookup. TRUE or FALSE, default TRUE.


Perform source file and line number lookup. TRUE or FALSE, default TRUE. Combines file and line information in the sourceloc#address attribute, e.g. mysource.cpp:42 for file “mysource.cpp” and line number 42.


Perform source file lookup, and writes the file name in the source.file#address attribute. TRUE or FALSE, default FALSE.


Perform source line lookup, and writes the line number in the source.line#address attribute. TRUE or FALSE, default FALSE.


Perform module name lookup, and writes the module number in the module#address attribute. TRUE or FALSE, default FALSE.


The sysalloc service wraps malloc, free, calloc, and realloc memory allocation calls, and marks the allocated memory regions so they can be tracked with the alloc service.


The textlog service prints a text representation of snapshots to a configurable output stream. This can be used to print out a log of the program’s progress at runtime.

Currently, text log output can only be triggered by attribute update events. Therefore, the event service must be active as well. You can select which attribute updates trigger a text log output, define the output format, and set the output stream (stdout, stderr, or a file name).

The following example prints a text log for the phase attribute of the test application with Caliper’s auto-generated format string:

$ export CALI_SERVICES_ENABLE=event:textlog:timestamp
$ ./test/cali-basic
== CALIPER: Registered event trigger service
== CALIPER: Registered timestamp service
== CALIPER: Registered text log service
== CALIPER: Initialized
phase=main/init                                                       21
phase=main/loop                                                       84
phase=main                                                            219
== CALIPER: Finished

Select attributes which trigger a text log output. Note that the event service must be active in order to trigger snapshots in the first place, and the attributes selected here must be in the list of attributes that trigger snapshots (defined by CALI_EVENT_TRIGGER).


Define what to print. The formatstring can contain fields, denoted by %attribute_name%, which prints the value of an attribute. Optionally, a field can contain a width specification, denoted by [width], to set the minimum width of a field. Any other text is printed verbatim. For example, Phase: %[32]app.phase% %[6]time.phase.duration% writes log strings with two fields: the value of the app.phase attribute with a minimum width of 40 characters, and the value of time.phase.duration attribute with a minimum width of 6 characters, respectively. A resulting log entry might look like this:

Phase: main/loop                       7018

Default: empty; Caliper will automatically create a format string based on the selected trigger attributes.


File name for the text log. May be set to stdout or stderr to print to the standard output or error streams, respectively.

Default: stdout


The timer service adds a time offset, timestamp, or duration to context records. Note that timestamps are not synchronized between nodes in a distributed-memory program.


Calulates inclusive times for nested regions. The value will be saved in the snapshot record as attribute time.inclusive.duration.

The event service with event trigger information generation needs to be enabled for this feature.

Default: true


The trace service creates an I/O record for each snapshot. With the recorder service enabled, this will create a snapshot trace file.

The trace service maintains per-thread snapshot buffers. By default, trace buffers will grow automatically. This behavior can be changed by setting a buffer policy. There are three options:


Grow the buffer when it is full. This is the default.


Stop recording when the buffer is full.


Flush the buffer when it is full and continue recording. Note that buffer flushes can significantly perturb the program’s performance.


Size of the trace buffer, in Megabytes. With the grow buffer policy, this is the size of a trace buffer chunk: When the buffer is full, another chunk of this size is added.

Default: 2 (MiB).


Sets the trace buffer policy (see above). Either grow, stop, or flush.

Default: grow.


The umpire service records statistics from the Umpire memory manager. It provides the following attributes:

Umpire allocator name


Actual size of the allocator


Current size of the allocator


The memory high-watermark for the allocator


Number of allocations in the allocator

Total memory allocated across all allocators

Number of allocations across all allocators



Boolean. Record per-allocator statistics. If false, the umpire service only records the and attributes, otherwise it will create separate records for each Umpire allocator.

Default: true