OpenMP profiling¶
Caliper can profile OpenMP constructs with the help of the OpenMP tools
interface (OMPT). This requires a compiler with OMPT support, e.g. clang 9+.
Build Caliper with -DWITH_OMPT=On
to enable it.
When OMPT support is enabled, Caliper provides the openmp-report built-in config, as well as the openmp.times and openmp.threads options for configs like runtime-report and hatchet-region-profile. With manual configurations, you can use the OMPT service.
OpenMP profiling with openmp-report¶
The openmp-report config measures and prints the time spent in OpenMP constructs on each thread:
$ CALI_CONFIG=openmp-report ./caliper-openmp-example
Path #Threads Time (thread) Time (total) Work % Barrier % Time (work) Time (barrier)
main 0.005122 0.027660 85.969388 14.030612
work 4 0.005110 0.027572 85.969388 14.030612 0.011121 0.001815
This shows example output for a program like the one in section OpenMP options for other configs. For Caliper regions with active OpenMP parallel regions inside them (like “work” in the example), the report prints the total CPU time across all threads spent doing OpenMP work (“Time (work)”) and stuck in OpenMP barriers (“Time (barrier)”). It also computes “Work %” and “Barrier %” metrics, which indicate the relative amount of time spent in work and barrier regions, respectively. These two metrics are inclusive, so we see the overall OpenMP efficiency for the entire program in the “main” region. The definition of the metrics is as follows:
- Path
The Caliper region hierarchy.
- #Threads
Maximum number of OpenMP threads in the region.
- Time (thread)
Time in seconds spent on the main thread (i.e., wall-clock time).
- Time (total)
Sum of CPU time in seconds across all threads.
- Work %
Percent of CPU time spent in OpenMP workshare regions vs. overall time in OpenMP. This is an inclusive metric (e.g., aggregated over all child regions).
- Barrier %
Percent of CPU time spent in barriers vs. overall time in OpenMP. This is also an inclusive metric.
- Time (work)
CPU time spent in OpenMP workshare regions (e.g.,
#pragma omp for
loops).- Time (barrier)
CPU time spent in OpenMP barriers (both implicit and explicit barriers).
Profiling by thread¶
With the “show_threads” option, we can see the times spent on each thread:
$ CALI_CONFIG=openmp-report,show_threads=1,show_regions=true ./caliper-openmp-example
Path #Threads Time (thread) Time (total) Thread Work % Barrier % Time (work) Time (barrier)
main
|- 0.000197 0.014312
|- 0.002747 3 100.000000
|- 0.003086 1 100.000000
|- 0.002856 2 100.000000
|- 0.005075 0.005075 0 62.539218 37.460782
work
|- 0.000180 0.014228
|- 4 0.002747 3 100.000000 0.002688
|- 4 0.003086 1 100.000000 0.003018
|- 4 0.002856 2 100.000000 0.002800
|- 4 0.005075 0.005075 0 62.539218 37.460782 0.002990 0.001791
We now see the additional “Thread” column with the OpenMP thread ID, and “Time (total)” shows the CPU time in each OpenMP thread per Caliper region. The rows without a “Thread” entry refer to time spent outside OpenMP parallel regions.
Thread summary view¶
With “show_regions=false”, we can disable the grouping by Caliper region and just show a performance summary across OpenMP threads:
$ CALI_CONFIG=openmp-report,show_threads=true,show_regions=false
#Threads Time (thread) Time (total) Thread Work % Barrier % Time (work) Time (barrier)
0.000175 0.016727
4 0.002760 3 100.000000 0.002722
4 0.002958 1 100.000000 0.002901
4 0.002768 2 100.000000 0.002699
4 0.004913 0.004913 0 62.682091 37.317909 0.002926 0.00174
OpenMP options for other configs¶
- Caliper also provides the OpenMP options for other profiling configs such as
hatchet-region-profile. These are:
- openmp.times
Measure and return CPU time spent in OpenMP workshare and barrier regions.
- openmp.efficiency
Compute the “Work %” and “Barrier %” metrics as in the openmp-report config shown above.
- openmp.threads
Group by thread (i.e., record metrics for each OpenMP thread separately), as with the “show_threads” option for openmp-report shown above.
Instrumentation¶
When instrumenting the code, make sure to use “process”-scope annotations to mark code regions outside of OpenMP parallel regions, or put thread-scope annotations onto every OpenMP thread. Mixing both approaches is possible but not recommended, since it produces separate region hierarchies for the process and thread scopes. See Notes on multi-threading for more information.
Use the CALI_CALIPER_ATTRIBUTE_DEFAULT_SCOPE
config flag to define if
Caliper regions should use process or thread scope. They use thread scope
by default. The following example sets the attribute default scope to
“process” so that the “main” and “work” Caliper regions are visible on all
OpenMP threads inside the parallel region:
#include <caliper/cali.h>
int main()
{
cali_config_set("CALI_CALIPER_ATTRIBUTE_DEFAULT_SCOPE", "process");
CALI_MARK_BEGIN("main");
CALI_MARK_BEGIN("work");
#pragma omp parallel
{
// ...
}
CALI_MARK_END("work");
CALI_MARK_END("main");
}
Enabling OMPT¶
Caliper enables the OpenMP tools interface automatically when the ompt
service is active. In some cases this can fail: this is often the case when
the OpenMP runtime is initialized before Caliper. In this case, set the
CALI_USE_OMPT
environment variable to “1” or “true” to enable
OpenMP support manually.