MPI Profiling ================================ Caliper's built-in profiling recipes support MPI natively and automatically aggregate performance data across all MPI ranks. In addition, Caliper provides MPI performance statistics. MPI Function Profiling -------------------------------- The `mpi-report` config recipe lists the number of invocations and the time spent in each MPI function (min/max/avg across MPI ranks). It works similar to the mpiP profiling tool. The first row shows the time outside of MPI (i.e., the computation time of the program):: $ CALI_CONFIG=mpi-report srun -n 8 ./lulesh2.0 Function Count (min) Count (max) Time (min) Time (max) Time (avg) Time % 446 518 0.315387 0.415731 0.353483 83.299370 MPI_Allreduce 10 11 0.000281 0.068973 0.045038 10.613409 MPI_Wait 107 177 0.000795 0.032157 0.014788 3.484918 MPI_Barrier 2 2 0.000051 0.007671 0.005110 1.204122 MPI_Isend 107 177 0.002300 0.002799 0.002571 0.605904 MPI_Waitall 31 31 0.000482 0.001858 0.001149 0.270677 MPI_Comm_split 2 2 0.000176 0.001925 0.000999 0.235499 MPI_Irecv 107 177 0.000446 0.000767 0.000631 0.148605 MPI_Bcast 4 4 0.000054 0.000500 0.000436 0.102674 MPI_Reduce 1 1 0.000032 0.000296 0.000072 0.017057 MPI_Comm_dup 1 1 0.000038 0.000066 0.000052 0.012178 MPI_Comm_free 2 2 0.000012 0.000015 0.000013 0.003020 MPI_Get_library_version 1 1 0.000007 0.000010 0.000008 0.001972 MPI_Gather 1 1 0.000020 0.000020 0.000020 0.000594 The `profile.mpi` option is available for most built-in profiling recipes, such as `runtime-report` or `hatchet-region-profile`. It shows the time spent in MPI functions within each Caliper region:: $ CALI_CONFIG=runtime-report,profile.mpi srun -n 8 ./lulesh2.0 Path Min time/rank Max time/rank Avg time/rank Time % main 0.007467 0.007918 0.007664 1.775109 CommRecv 0.000036 0.000068 0.000044 0.010198 MPI_Irecv 0.000036 0.000076 0.000044 0.010223 CommSend 0.000045 0.000053 0.000047 0.010781 MPI_Isend 0.000597 0.000621 0.000608 0.140909 MPI_Waitall 0.000015 0.000020 0.000016 0.003809 CommSBN 0.000035 0.000042 0.000037 0.008527 MPI_Wait 0.000027 0.000286 0.000147 0.034127 MPI_Barrier 0.000013 0.000104 0.000065 0.014967 lulesh.cycle 0.000212 0.000252 0.000228 0.052810 TimeIncrement 0.000085 0.000107 0.000091 0.021138 MPI_Allreduce 0.000210 0.071229 0.046590 10.791564 LagrangeLeapFrog 0.000263 0.000408 0.000320 0.074115 LagrangeNodal 0.004715 0.005330 0.005034 1.165980 CalcForceForNodes 0.000624 0.000747 0.000694 0.160774 CommRecv 0.000242 0.000287 0.000265 0.061496 MPI_Irecv 0.000240 0.000332 0.000280 0.064767 CalcVolumeForceForElems 0.001827 0.002038 0.001919 0.444573 IntegrateStressForElems 0.034616 0.038880 0.036624 8.483035 CalcHourglassControlForElems 0.095108 0.102434 0.098921 22.912601 CalcFBHourglassForceForElems 0.062848 0.071650 0.067722 15.686204 CommSend 0.000838 0.000949 0.000890 0.206152 MPI_Isend 0.000899 0.001126 0.000999 0.231382 MPI_Waitall 0.000140 0.000298 0.000216 0.049950 CommSBN 0.000558 0.000692 0.000615 0.142361 MPI_Wait 0.000334 0.018442 0.008193 1.897615 CommRecv 0.000044 0.000249 0.000152 0.035190 MPI_Irecv 0.000042 0.000281 0.000162 0.032884 CommSend 0.000086 0.001056 0.000560 0.129665 MPI_Isend 0.000067 0.000525 0.000327 0.066173 MPI_Waitall 0.000043 0.002110 0.000808 0.187086 (...) Message statistics ................................ The `mpi.message.count` and `mpi.message.size` options show the number of messages and transferred bytes for both point-to-point and collective communication operations. MPI Function filtering ................................ You can use the `mpi.include` and `mpi.exclude` to explicitly select or filter out MPI operations to capture. This is a more efficient option than filtering MPI functions with the name-based `include_regions` or `exclude_regions` option. As an example, we can use `mpi.include` to only measure `MPI_Allreduce`:: $ CALI_CONFIG=runtime-report,profile.mpi,mpi.include=MPI_Allreduce Path Min time/rank Max time/rank Avg time/rank Time % main 0.007588 0.008178 0.007737 1.834651 CommRecv 0.000024 0.000034 0.000029 0.006834 CommSend 0.000551 0.000631 0.000594 0.140933 CommSBN 0.000019 0.000046 0.000031 0.007338 lulesh.cycle 0.000218 0.000254 0.000233 0.055250 TimeIncrement 0.000088 0.000111 0.000098 0.023188 MPI_Allreduce 0.000180 0.066283 0.042576 10.095752 LagrangeLeapFrog 0.000278 0.000364 0.000324 0.076719 LagrangeNodal 0.004838 0.005261 0.005013 1.188590 CalcForceForNodes 0.000622 0.000839 0.000740 0.175528 CommRecv 0.000056 0.000076 0.000065 0.015465 (...)