Sample profiling¶
Sampling is an easy way to provide additional detail over Caliper’s source-code region annotations by showing where within a region the program spends its time. Sampling is currently available for Linux. It requires the -DWITH_SAMPLING=On, -DWITH_LIBDW=On, and (optional) -DWITH_LIBUNWIND=On options when building Caliper.
The sample-report configuration prints a sampling report with the time
spent in the sampled source-code functions within each Caliper region. For the
cxx-example program, we usually find that almost all of its time is spent
in the __nanosleep
system function:
$ CALI_CONFIG=sample-report ./examples/apps/cxx-example
Path Samples Time (sec) Function
main
mainloop
foo 71 0.355000 __nanosleep
There are several options for sample-report:
- source.function
Print the function symbol name where samples hit (on by default)
- source.location
Print the source file name and line number where samples hit
- source.module
Print the source module (.so/.exe) where samples hit
- callpath
Print the function call path instead of the Caliper region hierarchy.
- sample.frequency
The sample frequency. The default is 200Hz (one sample every 5 milliseconds). Increasing the sampling frequency provide more detail at the cost of higher overheads and processing times.
As an example, source.location shows us source file and line numbers of sampled locations:
$ CALI_CONFIG=sample-report,source.function=false,source.location ./examples/apps/cxx-example
Path Samples Time (sec) Source
main
mainloop
|- 1 0.005000 ../src/caliper/Caliper.cpp:992
|- 1 0.005000 ../include/caliper/comm~~ataAccessInterface.h:22
foo 66 0.330000 ../sysdeps/unix/syscall-template.S:81
For MPI programs, sample-report prints statistics aggregated across MPI ranks, just like runtime-report:
$ CALI_CONFIG=sample-report srun -n 8 lulesh2.0
Path Min time/rank Max time/rank Avg time/rank Total time Time % Function
main
|- 0.005000 0.005000 0.005000 0.035000 0.059691 Domain::AllocateElemPersistent
|- 0.005000 0.005000 0.005000 0.035000 0.059691 Domain::SetupThreadSupportStru
|- 0.005000 0.005000 0.005000 0.005000 0.008527 sysmalloc
|- 0.005000 0.005000 0.005000 0.005000 0.008527 Domain::BuildMesh(int, int, in
lulesh.cycle
TimeIncrement
|- 0.075000 0.740000 0.355000 2.840000 4.843523 gomp_barrier_wait_end
|- 0.005000 0.060000 0.027857 0.195000 0.332566 psm2_mq_ipeek2
|- 0.005000 0.005000 0.005000 0.015000 0.025582 psm_no_lock
|- 0.005000 0.060000 0.023571 0.165000 0.281402 psm_progress_wait
|- 0.015000 0.030000 0.022143 0.155000 0.264347 mv2_shm_bcast
|- 0.005000 0.025000 0.013750 0.055000 0.093801 amsh_poll
|- 0.005000 0.010000 0.007500 0.030000 0.051164 psmi_poll_internal
...
The hatchet-sample-profile config recipe writes machine-readable profiles for processing with Hatchet or cali-query. The profile contains sampling data for each MPI rank. By default, it writes a JSON file for processing with Hatchet, but you can use output.format=cali to write a .cali file for processing with cali-query, the Caliper Python reader library, or Hatchet’s .cali importer. hatchet-sample-profile supports the same options as sample-report above.
With output.format=cali option, the .cali file contains records with the following attributes:
- count
Sample count metric (number of times a sample hit)
- scount
Scaled count metric: sample count times sample frequency. Represents time in seconds.
- source.function#cali.sampler.pc
Sampled function symbol name, if enabled with the source.function option
- source.module#cali.sampler.pc
Sampled module name, if enabled with the source.module option
- sourceloc#cali.sampler.pc
Sampled source file name + line number, if enabled with the source.location option
- source.function#callpath.address
The function call path, if enabled with the callpath option
In this example, we collect a sampling profile in .cali format and use cali-query to compute a flat profile with the time and number of samples in each sampled function:
$ CALI_CONFIG=hatchet-sample-profile,source.function,output.format=cali ./lulesh2.0
$ cat << EOF > query.txt
> select
> source.function#cali.sampler.pc as Function,
> sum(count) as Samples,
> sum(scount) as Time,
> percent_total(count) as Percent
> group by
> source.function#cali.sampler.pc
> format
> table
> order by
> sum#count DESC
> EOF
$ cali-query -Q query.txt sample_profile.cali
Function Samples Time Percent
gomp_barrier_wait_end 62311 311.555000 26.918873
CalcFBHourglassForceForElems(~~ int, int) [clone ._omp_fn.7] 30673 153.365000 13.250993
CalcHourglassControlForElems(~~*, double) [clone ._omp_fn.6] 29055 145.275000 12.552003
gomp_team_barrier_wait_end 21708 108.540000 9.378038
IntegrateStressForElems(Domai~~ int, int) [clone ._omp_fn.4] 18856 94.280000 8.145950
CalcKinematicsForElems(Domain~~uble, int) [clone ._omp_fn.0] 11793 58.965000 5.094675
CalcMonotonicQGradientsForElems(Domain&) [clone ._omp_fn.14] 6489 32.445000 2.803302
brk 5848 29.240000 2.526385
psm2_mq_ipeek2 4067 20.335000 1.756978
...