Memory usage profiling

Caliper provides basic functionality to profile memory usage on Linux systems.

Heap allocation statistics

The alloc.stats option reports heap memory allocation statistics. To do so Caliper intercepts all malloc and free operations, so enabling this option can be somewhat expensive for programs with many small memory allocations. The output contains the following metrics:

Mem HWM

The memory high-water mark in bytes. Maximum total amount of memory that was allocated in this region.

Alloc tMax

Allocation tally maximum. This is the maximum total amount of memory that was allocated directly by this region. In contrast, the memory high-water mark includes memory allocated in any region.

Alloc count

Number of individual memory allocations.

Min, Avg, Max Bytes/alloc

Minimum, average and maximum bytes allocated per allocation operation.

An example output looks like this:

$ CALI_CONFIG=runtime-report,alloc.stats ./lulesh2.0
Path                                       Time (E) Time (I) Time % (E) Time % (I) Mem HWM  Alloc tMax Alloc count Min Bytes/alloc Avg Bytes/alloc Max Bytes/alloc
main                                       0.004867 0.100623   4.712664  97.423931  8750898    8750898          71               1          124779          864000
  lulesh.cycle                             0.000259 0.095755   0.250617  92.711267  8740194         96          10              96              96              96
    TimeIncrement                          0.000084 0.000084   0.081690   0.081690  8740194
    LagrangeLeapFrog                       0.000488 0.095412   0.472577  92.378960  8740194         96          20              96              96              96
      LagrangeNodal                        0.001119 0.060091   1.083415  58.180911  8740194
        CalcForceForNodes                  0.000490 0.058972   0.474139  57.097497  8740194        216           2              24             108             192
          CalcVolumeForceForElems          0.000710 0.058483   0.687355  56.623357  9604194     864000          40          216000          216000          216000
            IntegrateStressForElems        0.014498 0.014498  14.036874  14.036874 14788194    5184000          30         1728000         1728000         1728000
            CalcHourglassControlForElems   0.021265 0.043275  20.588633  41.899129 19972194   10368000          60         1728000         1728000         1728000
              CalcFBHourglassForceForElems 0.022010 0.022010  21.310495  21.310495 25156194    5184000          30         1728000         1728000         1728000
      LagrangeElements                     0.000215 0.034296   0.207968  33.205399  8740194
        CalcLagrangeElements               0.000369 0.007101   0.357067   6.875254  9388194     648000          30          216000          216000          216000
          CalcKinematicsForElems           0.006732 0.006732   6.518187   6.518187  9388194
        CalcQForElems                      0.003196 0.005037   3.094518   4.877323 10165794    1425600          60          216000          237600          259200
          CalcMonotonicQForElems           0.001841 0.001841   1.782805   1.782805 10165794
        ApplyMaterialPropertiesForElems    0.000541 0.021942   0.524234  21.244854  8956194     216000          10          216000          216000          216000
          EvalEOSForElems                  0.007972 0.021401   7.718625  20.720620  9902034     945840        1540             984           19636           67560
            CalcEnergyForElems             0.013429 0.013429  13.001996  13.001996  9969594      67560         350             984           45988           67560

Note that alloc.stats only accounts for memory allocations through C heap allocation calls (malloc, calloc, realloc, free) within the active profiling session. It does not account for any memory allocated through other means, including:

  • Allocations that happened before initializing the Caliper profiling session

  • Objects allocated in static memory or on the stack

  • Text segments (i.e., program code)

  • GPU memory allocated through cudaMalloc or hipMalloc etc.

  • Any heap allocations through other means, such as calling mmap directly or CUDA/HIP host memory allocation calls unless they fall back to malloc/free

Memory page use statistics

The mem.pages option provides high-water marks for total (VmSize), resident set (VmRSS), and Data memory usage in number of pages, similar to memory statistics reported by tools like top.

Unlike the alloc.stats option this feature does not require intercepting malloc/free calls, so it should induce less runtime overhead in programs using many small memory allocations. It does however induce a medium per-region overhead for reading information from /proc. Therefore, it may be advisable to limit profiling to high-level regions (e.g. with level=phase if the target code distinguishes regular and phase regions).

Also unlike the alloc.stats option, mem.pages captures all memory usage (except GPUs) including code segments, static memory, etc. However, according to the Linux documentation, the reported numbers may not be entirely accurate.

The option adds VmSize, VmRSS, and Data metrics representing the high-water mark in number of pages observed in the respective categories. Note the number of pages in the example is highest in the CalcFBHourglassForceForElems function, just as in the alloc.stats output before:

$ CALI_CONFIG=runtime-report,mem.highwatermark ./lulesh2.0
[...]
Path                                       Time (E) Time (I) Time % (E) Time % (I) VmSize VmRSS Data
main                                       0.010168 9.979632   0.101883  99.997550  79862  6786 33651
  lulesh.cycle                             0.000085 9.969464   0.000855  99.895667  81740  7665 35529
    TimeIncrement                          0.000047 0.000047   0.000469   0.000469  81740  7665 35529
    LagrangeLeapFrog                       0.000839 9.969332   0.008402  99.894343  81740  7665 35529
      LagrangeNodal                        0.056944 0.239477   0.570587   2.399594  81740  7665 35529
        CalcForceForNodes                  0.017880 0.182533   0.179165   1.829007  81740  7665 35529
          CalcVolumeForceForElems          0.039597 0.164652   0.396773   1.649842  81740  7742 35529
            IntegrateStressForElems        0.051738 0.051738   0.518423   0.518423  81740  7742 35529
            CalcHourglassControlForElems   0.030218 0.073317   0.302787   0.734645  83849  9804 37638
              CalcFBHourglassForceForElems 0.043099 0.043099   0.431859   0.431859  84271 10196 38060
      LagrangeElements                     0.012675 9.057428   0.127006  90.756916  81740  7665 35529