Hatchet Documentation

Hatchet is a Python-based library to analyze performance data that has a hierarchy (derived from calling context trees, call graphs, callpath traces, nested regions’ timers, etc.). Hatchet implements various operations to analyze a single hierarchical data set or compare multiple data sets.

Download and Install

Hatchet is available on GitHub.

Install

Hatchet requires a Python installation (2.7+), matplotlib, numpy and pandas.

User Guide

Hatchet is a Python tool that simplifies the process of analyzing hierarchical performance data such as calling context trees. Hatchet uses pandas dataframes to store the data on each node of the hierarchy and keeps the graph relationships between the nodes in a different data structure that is kept consistent with the dataframe.

Supported Input File Formats

Currently, hatchet supports two file formats as input:

  • HPCToolkit database: This is generated by using hpcprof-mpi to post-process the raw measurements directory output by HPCToolkit.
  • Caliper Json-split file: This is generated by either running cali-query on the raw Caliper data or by enabling the mpireport service when using caliper.

Graphframe

Graphframe is the main data structure in hatchet that stores the performance data that is read in from an HPCToolkit database or Caliper Json file. Typically, the raw input data is in the form of a tree. However, since subsequent operations on the tree can lead to new edges being created which can turn the tree into a graph, we store the input data as a directed graph. The graphframe consists of a graph object that stores the edge relationships between nodes and a dataframe that stores different metrics (numerical data) and categorical data associated with each node.

Graph

The graph can be connected or disconnected (multiple roots) and each node in the graph can have one or more parents and children. The node stores its callpath, which is a tuple of the node names from the root to this node. This is used as one of the indices in the dataframe.

Dataframe

The dataframe holds all the numerical and categorical data associated with each node. Since typically the call tree data is per process, a multiindex composed of the node and MPI rank is used to index into the dataframe.

Graph-centric Operations

Squash: The squash operation is always performed after a filter is done on the dataframe. Squash removes nodes from the graph that were filtered out due to a previous filter operation. When one or more nodes on a path are removed from the graph, the nearest alive ancestor is connected by an edge to the nearest alive child on the path.

Union:

Diff:

Groupby and Aggregate:

Dataframe-centric Operations

Filter: filter takes a user supplied function and applies that to all rows in the dataframe. The resulting Series or dataframe is used to filter the dataframe to only return rows that are True. The returned graphframe preserves the graph provided as input.

Fill:

hatchet package

Submodules

hatchet.caliper_reader module

hatchet.graph module

hatchet.graphframe module

hatchet.hpctoolkit_reader module

hatchet.node module

Module contents

Subpackages

hatchet.external package

Submodules
hatchet.external.printtree module
Module contents

hatchet.util package

Submodules
hatchet.util.timer module
Module contents

Indices and tables