The CalQL query language¶
The Caliper Query Language (CalQL) is used to filter, aggregate, and create reports from Caliper data in Caliper’s data processing tools (cali-query) and report services. Its syntax is very similar to SQL.
For example, filtering and aggregating a trace into a function time profile for an application’s main loop and printing results in a table, sorted by time, is accomplished with the following statement:
SELECT *, count(), sum(time.duration.ns)
WHERE loop=mainloop
GROUP BY function
FORMAT table
ORDER BY time.duration.ns
Quick reference¶
This table contains a quick reference of all CalQL statements:
LET <list> # Compute additional attributes in input records
<a>=ratio(<b>,<c>,<S>) # computes a = <b>/<c>*S
<a>=scale(<b>,<S>) # computes <a> = <b>*S
<a>=truncate(<b>,<S>) # computes <a> = <b> - mod(<b>, S)
<a>=first(<a0>,<a1>, ...) # <a> is the first of <a0>, <a1>, ... found in the input record
<a>=sum(<a0>,<a1>,...) # computes sum of <a0>, <a1>, ... in the input record
<a>=leaf(<b>) # Takes the leaf of the region hierarchy <b>
... IF <condition> # apply only if input record meets condition
SELECT <list> # Select attributes and define aggregations (i.e., select columns)
* # select all attributes
path # select the nested region hierarchy
<attribute> # select <attribute>
count() # number of input records in grouping
sum(<attribute>) # compute sum of <attribute> for grouping
min(<attribute>) # compute min of <attribute> for grouping
max(<attribute>) # compute max of <attribute> for grouping
avg(<attribute>) # compute average of <attribute> for grouping
percent_total(<attribute>) # compute percent of total sum for <attribute> in grouping
ratio(<a>,<b>,<S>) # compute sum(<a>)/sum(<b>)*S for grouping
inclusive_sum(<a>) # computes inclusive sum of <a> in a nested hierarchy
scale(<a>,<S>) # computes sum of <a> multiplied by scaling factor S
inclusive_scale(<a>,<S>) # computes inclusive scaled sum of <a> in a nested hierarchy
inclusive_percent_total(<a>) # computes inclusive percent-of-total of <a> in a nested hierarchy
any(<attribute>) # pick one value for <attribute> out of all records in grouping
scale_count(<S>) # count number of input records for grouping and multiplied by <S>
inclusive_ratio(<a>,<b>,<S>) # computes inclusive_sum(<a>)/inclusive_sum(<b>)*<S>
inclusive_min(<a>) # compute inclusive min of <a>
inclusive_max(<a>) # compute inclusive max of <a>
variance(<a>) # compute population variance (sum(a^2)/N - avg(a)^2) of <a>
... AS <name> # use <name> as column header in tree or table formatter
... UNIT <unit> # use <unit> as unit name
GROUP BY <list> # Define aggregation grouping (what to aggregate over, e.g. "function,mpi.rank")
<attribute> # include <attribute> in grouping
path # include the nested region hierarchy in grouping
WHERE <list> # define filter (i.e., select records/rows)
<attribute> # records where <attribute> exists
<attribute> = <value> # records where <attribute> = <value>
<attribute> < <value> # records where <attribute> > <value>
<attribute> > <value> # records where <attribute> < <value>
NOT <clause> # negate the filter clause
FORMAT <formatter> # Define output format
cali # .cali format
expand # “<attribute1>=<value1>,<attibute2>=<value2>,...”
json # write json records { “attribute1”: “value1”, “attribute2”: “value2” }
(pretty) # ... in a more human-readable format
(object) # ... in a json object that includes attribute info and globals
json-split # json format w/ separate node hierarchy for hatchet library
table # human-readable text table
tree # human-readable text tree output
ORDER BY <list> # Sort records in output (table and tree formatters)
<attribute> # order by <attribute>
... ASC # sort in ascending order
... DESC # sort in descending order
LET¶
The LET statement applies computation operators on input record entries and adds the results as a new entries in the input records. The new entries can then be used in subsequent SELECT, GROUP BY, or FORMAT statements. For example, we can use the scale() operator to scale a value before subsequent aggregations:
LET
sec=scale(time.duration.ns,1e-9)
SELECT
path,sum(sec)
We can use the truncate() operator on an iteration counter to aggregate blocks of 10 iterations in a time-series profile:
LET
block=truncate(iteration#mainloop,10)
SELECT
block,sum(time.duration.ns)
GROUP BY
block
The first() operator returns the first attribute out of a list of attribute names found in an input record. It can also be used to rename attributes:
LET
time=first(time.duration.ns,sum#time.duration.ns)
SELECT
sum(time) AS Time
GROUP BY
path
LET terms have the general form
a = f(…) [IF <condition>]
where f is one of the operators and a is the name of the result attribute. The result is added to the input records before the record is processed further. Result entries are only added to a record if all required input operands are present.
With the optional IF condition, the operation is only applied for input records that meet a condition. One can use this to compute values for a specific subset of records. The condition clauses use the same syntax as WHERE filter clauses. The example below defines a “work” attribute with the time in records that contain “omp.work” regions, and then uses that to compute efficiency from the total and “work” time:
- LET
work=first(time.duration.ns) IF omp.work
- SELECT
sum(time.duration.ns) AS Total, sum(work) AS Work, ratio(work,time.duration.ns) AS Efficiency
- GROUP BY
path
SELECT¶
The SELECT statement selects which attributes in snapshot records
should be printed, and defines aggregations. SELECT *
selects all
attributes. SELECT op(arg)
enables aggregation operation op with
argument arg. Generally, attributes will be printed in the order
given in the SELECT statement.
An example to print all attributes and enable visit count aggregation:
SELECT *, count()
Aggregation operations create a new output attribute. The name is typically
derived from the input attribute(s). For example, the result of sum(attr)
is stored in sum#attr
. All selection attributes and aggregation
arguments must come from the input data; recursive
aggregations (e.g., min(count())
) within a query are not supported.
The AS
keyword assigns a user-defined name to a selected attribute
or aggregation result. Example:
SELECT sum(time.duration.ns) AS "Time (nsec)" FORMAT table
Here, the table formatter uses “Time (nsec)” instead of “sum#time.duration.ns” as
column name for the sum(time.duration.ns)
column. Only some
formatters (table, tree, json, and json-split) support AS
.
Inclusive aggregation operations (inclusive_sum, inclusive_scale, and
inclusive_percent_total) compute inclusive values (value for a tree node
plus all of its children) for datasets with hierarchical regions. This
applies to the hierarchy defined by attributes with the
CALI_ATTR_NESTED
property, including the default function,
annotation, and loop attributes from Caliper’s high-level annotation macros.
A more complex example:
SELECT
*,
scale(time.duration.ns,1e-9) AS Time,
inclusive_percent_total(time.duration.ns) AS "Time %"
GROUP BY
path
FORMAT
tree
The computes the (exclusive) sum of time.duration.ns divided by 100000000 and the inclusive percent-of-total for time.duration.ns. Example output:
Path Time Time %
main 5 100
foo 35 90
bar 10 20
WHERE¶
The WHERE statement can be used to filter the records to aggregate/print. The statement takes a comma-separated list of clauses. Records that don’t match all of the clauses are filtered out. Filters can only be defined on input attributes, i.e. it is not possible to filter on aggregation results.
Currently, there are clauses to test for existance of an attribute
label in a record, and to filter for specific attribute
values. Clauses can be negated with NOT
. Example:
WHERE loop,function=foo
Select records that contain loop
attributes where function equals
foo
. Note: for nested attributes, the match may occur on any
level. The above example matches the following:
{ loop=mainloop, iteration=5, function=main/foo } Matches
{ loop=mainloop, iteration=5, function=main/foo/bar } Matches ('function=foo' will be matched on any nesting level)
{ function=main/foo } No match: 'loop' attribute missing
The NOT
keyword negates clauses:
WHERE NOT iteration#mainloop=4
matches every record where iteration#mainloop
is not 4 (including
records without iteration#mainloop
attributes.
GROUP BY¶
The GROUP BY statement defines the aggregation key for aggregation operations. The aggregation key defines for which attributes separate (aggregate) records will be kept. That is, the aggregator will generate an aggregate record for each unique combination of key values found in the input records. The values of the aggregation attributes in the input records will be accumulated and appended to the aggregate record.
Consider the following table of input records:
function loop iteration
mainloop
foo mainloop 0
bar mainloop 0
foo mainloop 1
bar mainloop 1
foo mainloop 2
bar mainloop 2
With this input, the following GROUP BY statement will create a function profile:
SELECT *, count() GROUP BY function
function count
foo 3
bar 3
FORMAT¶
The FORMAT statement selects the output format option. Caliper can produce machine-readable (e.g., json or Caliper’s own csv-style) or human-readable output (text tables or a tree representation).
See Output formatting for a list of available formatters.
ORDER BY¶
Sort output records by the given sort criteria. The statement defines a list of attributes to sort output records by. Records can be sorted ascending (using the ASC keyword) or descending (using DESC). Sorting is available in the “tree” and “table” formatters.
The following example prints a iteration/function profile ordered by
time and iteration number. Note that one must use the original
attribute name and not an alias assigned with AS
:
SELECT
*,
sum(time.inclusive.duration.ns) AS Time
FORMAT
table
ORDER BY
sum#time.inclusive.duration.ns DESC,
iteration#mainloop
function loop iteration#mainloop Time
main 100000
main mainloop 80000
main/foo mainloop 0 2500
main mainloop 0 1500
main/foo mainloop 1 3500
main mainloop 1 2000
main mainloop 2 1000
main/foo mainloop 2 600
...