Hatchet is a Python-based library to analyze performance data that has a hierarchy (derived from calling context trees, call graphs, callpath traces, nested regions’ timers, etc.). Hatchet implements various operations to analyze a single hierarchical data set or compare multiple data sets.
Download and Install¶
Hatchet is available on GitHub.
Hatchet requires a Python installation (2.7+), matplotlib, numpy and pandas.
Hatchet is a Python tool that simplifies the process of analyzing hierarchical performance data such as calling context trees. Hatchet uses pandas dataframes to store the data on each node of the hierarchy and keeps the graph relationships between the nodes in a different data structure that is kept consistent with the dataframe.
Supported Input File Formats¶
Currently, hatchet supports two file formats as input:
Graphframe is the main data structure in hatchet that stores the
performance data that is read in from an HPCToolkit database or Caliper Json
file. Typically, the raw input data is in the form of a tree. However, since
subsequent operations on the tree can lead to new edges being created which can
turn the tree into a graph, we store the input data as a directed graph. The
graphframe consists of a graph object that stores the edge relationships
between nodes and a dataframe that stores different metrics (numerical data)
and categorical data associated with each node.
The graph can be connected or disconnected (multiple roots) and each node in the graph can have one or more parents and children. The node stores its callpath, which is a tuple of the node names from the root to this node. This is used as one of the indices in the dataframe.
The dataframe holds all the numerical and categorical data associated with each node. Since typically the call tree data is per process, a multiindex composed of the node and MPI rank is used to index into the dataframe.
squash operation is always performed after a
done on the dataframe.
Squash removes nodes from the graph that were
filtered out due to a previous
filter operation. When one or more nodes on
a path are removed from the graph, the nearest alive ancestor is connected by
an edge to the nearest alive child on the path.
Groupby and Aggregate:
filter takes a user supplied function and applies that to all
rows in the dataframe. The resulting Series or dataframe is used to filter the
dataframe to only return rows that are True. The returned graphframe preserves
the graph provided as input.