Sample profiling

Sampling is an easy way to provide additional detail over Caliper’s source-code region annotations by showing where within a region the program spends its time. Sampling is currently available for Linux. It requires the -DWITH_SAMPLING=On, -DWITH_LIBDW=On, and (optional) -DWITH_LIBUNWIND=On options when building Caliper.

The sample-report configuration prints a sampling report with the time spent in the sampled source-code functions within each Caliper region. For the cxx-example program, we usually find that almost all of its time is spent in the __nanosleep system function:

$ CALI_CONFIG=sample-report ./examples/apps/cxx-example
Path       Samples Time (sec) Function
main
  mainloop
    foo         71   0.355000 __nanosleep

There are several options for sample-report:

source.function

Print the function symbol name where samples hit (on by default)

source.location

Print the source file name and line number where samples hit

source.module

Print the source module (.so/.exe) where samples hit

callpath

Print the function call path instead of the Caliper region hierarchy.

sample.frequency

The sample frequency. The default is 200Hz (one sample every 5 milliseconds). Increasing the sampling frequency provide more detail at the cost of higher overheads and processing times.

As an example, source.location shows us source file and line numbers of sampled locations:

$ CALI_CONFIG=sample-report,source.function=false,source.location ./examples/apps/cxx-example
Path       Samples Time (sec) Source
main
  mainloop
   |-            1   0.005000 ../src/caliper/Caliper.cpp:992
   |-            1   0.005000 ../include/caliper/comm~~ataAccessInterface.h:22
    foo         66   0.330000 ../sysdeps/unix/syscall-template.S:81

For MPI programs, sample-report prints statistics aggregated across MPI ranks, just like runtime-report:

$ CALI_CONFIG=sample-report srun -n 8 lulesh2.0
Path         Min time/rank Max time/rank Avg time/rank Total time Time %    Function
main
 |-               0.005000      0.005000      0.005000   0.035000  0.059691 Domain::AllocateElemPersistent
 |-               0.005000      0.005000      0.005000   0.035000  0.059691 Domain::SetupThreadSupportStru
 |-               0.005000      0.005000      0.005000   0.005000  0.008527 sysmalloc
 |-               0.005000      0.005000      0.005000   0.005000  0.008527 Domain::BuildMesh(int, int, in
lulesh.cycle
    TimeIncrement
     |-           0.075000      0.740000      0.355000   2.840000  4.843523 gomp_barrier_wait_end
     |-           0.005000      0.060000      0.027857   0.195000  0.332566 psm2_mq_ipeek2
     |-           0.005000      0.005000      0.005000   0.015000  0.025582 psm_no_lock
     |-           0.005000      0.060000      0.023571   0.165000  0.281402 psm_progress_wait
     |-           0.015000      0.030000      0.022143   0.155000  0.264347 mv2_shm_bcast
     |-           0.005000      0.025000      0.013750   0.055000  0.093801 amsh_poll
     |-           0.005000      0.010000      0.007500   0.030000  0.051164 psmi_poll_internal
...

The hatchet-sample-profile config recipe writes machine-readable profiles for processing with Hatchet or cali-query. The profile contains sampling data for each MPI rank. By default, it writes a JSON file for processing with Hatchet, but you can use output.format=cali to write a .cali file for processing with cali-query, the Caliper Python reader library, or Hatchet’s .cali importer. hatchet-sample-profile supports the same options as sample-report above.

With output.format=cali option, the .cali file contains records with the following attributes:

count

Sample count metric (number of times a sample hit)

scount

Scaled count metric: sample count times sample frequency. Represents time in seconds.

source.function#cali.sampler.pc

Sampled function symbol name, if enabled with the source.function option

source.module#cali.sampler.pc

Sampled module name, if enabled with the source.module option

sourceloc#cali.sampler.pc

Sampled source file name + line number, if enabled with the source.location option

source.function#callpath.address

The function call path, if enabled with the callpath option

In this example, we collect a sampling profile in .cali format and use cali-query to compute a flat profile with the time and number of samples in each sampled function:

$ CALI_CONFIG=hatchet-sample-profile,source.function,output.format=cali ./lulesh2.0
$ cat << EOF > query.txt
> select
>   source.function#cali.sampler.pc as Function,
>   sum(count) as Samples,
>   sum(scount) as Time,
>   percent_total(count) as Percent
> group by
>   source.function#cali.sampler.pc
> format
>   table
> order by
>   sum#count DESC
> EOF
$ cali-query -Q query.txt sample_profile.cali
Function                                                     Samples Time       Percent
gomp_barrier_wait_end                                          62311 311.555000 26.918873
CalcFBHourglassForceForElems(~~ int, int) [clone ._omp_fn.7]   30673 153.365000 13.250993
CalcHourglassControlForElems(~~*, double) [clone ._omp_fn.6]   29055 145.275000 12.552003
gomp_team_barrier_wait_end                                     21708 108.540000  9.378038
IntegrateStressForElems(Domai~~ int, int) [clone ._omp_fn.4]   18856  94.280000  8.145950
CalcKinematicsForElems(Domain~~uble, int) [clone ._omp_fn.0]   11793  58.965000  5.094675
CalcMonotonicQGradientsForElems(Domain&) [clone ._omp_fn.14]    6489  32.445000  2.803302
brk                                                             5848  29.240000  2.526385
psm2_mq_ipeek2                                                  4067  20.335000  1.756978
...