hatchet package

Submodules

hatchet.frame module

class hatchet.frame.Frame(attrs=None, **kwargs)[source]

Bases: object

The frame index for a node. The node only stores its frame.

Parameters:attrs (dict) – dictionary of attributes and values
copy()[source]
get(name, default=None)[source]
tuple_repr

Make a tuple of attributes and values based on reader.

values(names)[source]

Return a tuple of attribute values from this Frame.

hatchet.graph module

class hatchet.graph.Graph(roots)[source]

Bases: object

A possibly multi-rooted tree or graph from one input dataset.

copy(old_to_new=None)[source]

Create and return a copy of this graph.

Parameters:old_to_new (dict, optional) – if provided, this dictionary will be populated with mappings from old node -> new node
enumerate_depth()[source]
enumerate_traverse()[source]
find_merges()[source]

Find nodes that have the same parent and frame.

Find nodes that have the same parent and duplicate frame, and return a mapping from nodes that should be eliminated to nodes they should be merged into.

Returns:dictionary from nodes to their merge targets
Return type:(dict)
static from_lists(*roots)[source]

Convenience method to invoke Node.from_lists() on each root value.

is_tree()[source]

True if this graph is a tree, false otherwise.

merge_nodes(merges)[source]

Merge some nodes in a graph into others.

merges is a dictionary keyed by old nodes, with values equal to the nodes that they need to be merged into. Old nodes’ parents and children are connected to the new node.

Parameters:merges (dict) – dictionary from source nodes -> targets
normalize()[source]
traverse(order='pre', attrs=None, visited=None)[source]

Preorder traversal of all roots of this Graph.

Parameters:attrs (list or str, optional) – If provided, extract these fields from nodes while traversing and yield them. See traverse() for details.

Only preorder traversal is currently supported.

union(other, old_to_new=None)[source]

Create the union of self and other and return it as a new Graph.

This creates a new graph and does not modify self or other. The new Graph has entirely new nodes.

Parameters:
  • other (Graph) – another Graph
  • old_to_new (dict, optional) – if provided, this dictionary will be populated with mappings from old node -> new node
Returns:

new Graph containing all nodes and edges from self and other

Return type:

(Graph)

hatchet.graph.index_by(attr, objects)[source]

Put objects into lists based on the value of an attribute.

Returns:dictionary of lists of objects, keyed by attribute value
Return type:(dict)

hatchet.graphframe module

exception hatchet.graphframe.EmptyFilter[source]

Bases: Exception

Raised when a filter would otherwise return an empty GraphFrame.

class hatchet.graphframe.GraphFrame(graph, dataframe, exc_metrics=None, inc_metrics=None, default_metric='time')[source]

Bases: object

An input dataset is read into an object of this type, which includes a graph and a dataframe.

add(other)[source]

Returns the column-wise sum of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:new graphframe
Return type:(GraphFrame)
copy()[source]

Return a shallow copy of the graphframe.

This copies the DataFrame, but the Graph is shared between self and the new GraphFrame.

deepcopy()[source]

Return a copy of the graphframe.

div(other)[source]

Returns the column-wise float division of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:new graphframe
Return type:(GraphFrame)
drop_index_levels(function=<function mean>)[source]

Drop all index levels but node.

filter(filter_obj, squash=True, num_procs=2)[source]

Filter the dataframe using a user-supplied function.

Note: Operates in parallel on user-supplied lambda functions.

Parameters:
  • filter_obj (callable, list, or QueryMatcher) – the filter to apply to the GraphFrame.
  • squash (boolean, optional) – if True, automatically call squash for the user.
static from_caliper(filename, query)[source]

Read in a Caliper cali file.

Parameters:
  • filename (str) – name of a Caliper output file in .cali format
  • query (str) – cali-query in CalQL format
static from_caliper_json(filename_or_stream)[source]

Read in a Caliper cali-query JSON-split file or an open file object.

Parameters:filename_or_stream (str or file-like) – name of a Caliper JSON-split output file, or an open file object to read one
static from_cprofile(filename)[source]

Read in a pstats/prof file generated using python’s cProfile.

static from_gprof_dot(filename)[source]

Read in a DOT file generated by gprof2dot.

static from_hpctoolkit(dirname)[source]

Read an HPCToolkit database directory into a new GraphFrame.

Parameters:dirname (str) – parent directory of an HPCToolkit experiment.xml file
Returns:new GraphFrame containing HPCToolkit profile data
Return type:(GraphFrame)
static from_lists(*lists)[source]

Make a simple GraphFrame from lists.

This creates a Graph from lists (see Graph.from_lists()) and uses it as the index for a new GraphFrame. Every node in the new graph has exclusive time of 1 and inclusive time is computed automatically.

static from_literal(graph_dict)[source]

Create a GraphFrame from a list of dictionaries.

static from_pyinstrument(filename)[source]

Read in a JSON file generated using Pyinstrument.

static from_timemory(input=None, select=None, **_kwargs)[source]

Read in timemory data.

Links:
https://github.com/NERSC/timemory https://timemory.readthedocs.io
Parameters:
  • input (str or file-stream or dict or None) –

    Valid argument types are:

    1. Filename for a timemory JSON tree file
    2. Open file stream to one of these files
    3. Dictionary from timemory JSON tree
      Currently, timemory supports two JSON layouts: flat and tree.

    The former is a 1D-array representation of the hierarchy which represents the hierarchy via indentation schemes in the labels and is not compatible with hatchet. The latter is a hierarchical representation of the data and is the required JSON layout when using hatchet. Timemory JSON tree files typically have the extension “.tree.json”.

    If input is None, this assumes that timemory has been recording data within the application that is using hatchet. In this situation, this method will attempt to import the data directly from timemory.

    At the time of this writing, the direct data import will:
    1. Stop any currently collecting components
    2. Aggregate child thread data of the calling thread
    3. Clear all data on the child threads
    4. Aggregate the data from any MPI and/or UPC++ ranks.

    Thus, if MPI or UPC++ is used, every rank must call this routine. The zeroth rank will have the aggregation and all the other non-zero ranks will only have the rank-specific data.

    Whether or not the per-thread and per-rank data itself is

    combined is controlled by the collapse_threads and collapse_processes attributes in the timemory.settings submodule.

    In the C++ API, it is possible for only #1 to be applied and data

    can be obtained for an individual thread and/or rank without aggregation. This is not currently available to Python, however, it can be made available upon request via a GitHub Issue.

  • select (list of str) – A list of strings which match the component enumeration names, e.g. [“cpu_clock”].
  • per_thread (boolean) – Ensures that when applying filters to the graphframe, frames with identical name/file/line/etc. info but from different threads are not combined
  • per_rank (boolean) – Ensures that when applying filters to the graphframe, frames with identical name/file/line/etc. info but from different ranks are not combined
groupby_aggregate(groupby_function, agg_function)[source]

Groupby-aggregate dataframe and reindex the Graph.

Reindex the graph to match the groupby-aggregated dataframe.

Update the frame attributes to contain those columns in the dataframe index.

Parameters:
  • self (graphframe) – self’s graphframe
  • groupby_function – groupby function on dataframe
  • agg_function – aggregate function on dataframe
Returns:

new graphframe with reindexed graph and groupby-aggregated dataframe

Return type:

(GraphFrame)

mul(other)[source]

Returns the column-wise float multiplication of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:new graphframe
Return type:(GraphFrame)
show_metric_columns()[source]

Returns a list of dataframe column labels.

squash()[source]

Rewrite the Graph to include only nodes present in the DataFrame’s rows.

This can be used to simplify the Graph, or to normalize Graph indexes between two GraphFrames.

sub(other)[source]

Returns the column-wise difference of two graphframes as a new graphframe.

This graphframe is the union of self’s and other’s graphs, and does not modify self or other.

Returns:new graphframe
Return type:(GraphFrame)
subgraph_sum(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]

Compute sum of elements in subgraphs.

For each row in the graph, out_columns will contain the element-wise sum of all values in columns for that row’s node and all of its descendants.

This algorithm is worst-case quadratic in the size of the graph, so we try to call subtree_sum if we can. In general, there is not a particularly efficient algorithm known for subgraph sums, so this does about as well as we know how.

Parameters:
  • columns (list of str) – names of columns to sum (default: all columns)
  • out_columns (list of str) – names of columns to store results (default: in place)
  • function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))
subtree_sum(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]

Compute sum of elements in subtrees. Valid only for trees.

For each row in the graph, out_columns will contain the element-wise sum of all values in columns for that row’s node and all of its descendants.

This algorithm will multiply count nodes with in-degree higher than one – i.e., it is only correct for trees. Prefer using subgraph_sum (which calls subtree_sum if it can), unless you have a good reason not to.

Parameters:
  • columns (list of str) – names of columns to sum (default: all columns)
  • out_columns (list of str) – names of columns to store results (default: in place)
  • function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))
to_dot(metric=None, name='name', rank=0, thread=0, threshold=0.0)[source]

Write the graph in the graphviz dot format: https://www.graphviz.org/doc/info/lang.html

to_flamegraph(metric=None, name='name', rank=0, thread=0, threshold=0.0)[source]

Write the graph in the folded stack output required by FlameGraph http://www.brendangregg.com/flamegraphs.html

to_literal(name='name', rank=0, thread=0)[source]

Format this graph as a list of dictionaries for Roundtrip visualizations.

tree(metric_column=None, precision=3, name_column='name', expand_name=False, context_column='file', rank=0, thread=0, depth=10000, highlight_name=False, invert_colormap=False)[source]

Format this graphframe as a tree and return the resulting string.

unify(other)[source]

Returns a unified graphframe.

Ensure self and other have the same graph and same node IDs. This may change the node IDs in the dataframe.

Update the graphs in the graphframe if they differ.

update_inclusive_columns()[source]

Update inclusive columns (typically after operations that rewire the graph.

exception hatchet.graphframe.InvalidFilter[source]

Bases: Exception

Raised when an invalid argument is passed to the filter function.

hatchet.graphframe.parallel_apply(filter_function, dataframe, queue)[source]

A function called in parallel, which does a pandas apply on part of a dataframe and returns the results via multiprocessing queue function.

hatchet.node module

exception hatchet.node.MultiplePathError[source]

Bases: Exception

Raised when a node is asked for a single path but has multiple.

class hatchet.node.Node(frame_obj, parent=None, hnid=-1, depth=-1)[source]

Bases: object

A node in the graph. The node only stores its frame.

add_child(node)[source]

Adds a child to this node’s list of children.

add_parent(node)[source]

Adds a parent to this node’s list of parents.

copy()[source]

Copy this node without preserving parents or children.

dag_equal(other, vs=None, vo=None)[source]

Check if DAG rooted at self has the same structure as that rooted at other.

classmethod from_lists(lists)[source]

Construct a hierarchy of nodes from recursive lists.

For example, this will construct a simple tree:

Node.from_lists(
    ["a",
        ["b", "d", "e"],
        ["c", "f", "g"],
    ]
)
     a
    / \
   b   c
 / |   | \
d  e   f  g

And this will construct a simple diamond DAG:

d = Node(Frame(name="d"))
Node.from_lists(
    ["a",
        ["b", d],
        ["c", d]
    ]
)
  a
 / \
b   c
 \ /
  d

In the above examples, the ‘a’ represents a Node with its frame == Frame(name=”a”).

path(attrs=None)[source]

Path to this node from root. Raises if there are multiple paths.

Parameters:attrs (str or list, optional) – attribute(s) to extract from Frames

This is useful for trees (where each node only has one path), as it just gets the only element from self.paths. This will fail with a MultiplePathError if there is more than one path to this node.

paths(attrs=None)[source]

List of tuples, one for each path from this node to any root.

Parameters:attrs (str or list, optional) – attribute(s) to extract from Frames

Paths are tuples of Frame objects, or, if attrs is provided, they are paths containing the requested attributes.

traverse(order='pre', attrs=None, visited=None)[source]

Traverse the tree depth-first and yield each node.

Parameters:
  • order (str) – “pre” or “post” for preorder or postorder (default: pre)
  • attrs (list or str, optional) – if provided, extract these fields from nodes while traversing and yield them
  • visited (dict, optional) – dictionary in which each visited node’s in-degree will be stored
hatchet.node.traversal_order(node)[source]

Deterministic key function for sorting nodes in traversals.

hatchet.query_matcher module

exception hatchet.query_matcher.InvalidQueryFilter[source]

Bases: Exception

Raised when a query filter does not have a valid syntax

exception hatchet.query_matcher.InvalidQueryPath[source]

Bases: Exception

Raised when a query does not have the correct syntax

class hatchet.query_matcher.QueryMatcher(query=None)[source]

Bases: object

Process and apply queries to GraphFrames.

apply(gf)[source]

Apply the query to a GraphFrame.

Parameters:gf (GraphFrame) – the GraphFrame on which to apply the query.
Returns:A list of lists representing the set of paths that match this query.
Return type:(list)
match(wildcard_spec='.', filter_func=<function QueryMatcher.<lambda>>)[source]

Start a query with a root node described by the arguments.

Parameters:
  • wildcard_spec (str, optional, ".", "*", or "+") – the wildcard status of the node (follows standard Regex syntax)
  • filter_func (callable, optional) – a callable accepting only a row from a Pandas DataFrame that is used to filter this node in the query
Returns:

The instance of the class that called this function (enables fluent design).

Return type:

(QueryMatcher)

rel(wildcard_spec='.', filter_func=<function QueryMatcher.<lambda>>)[source]

Add another edge and node to the query.

Parameters:
  • wildcard_spec (str, optional, ".", "*", or "+") – the wildcard status of the node (follows standard Regex syntax)
  • filter_func (callable, optional) – a callable accepting only a row from a Pandas DataFrame that is used to filter this node in the query
Returns:

The instance of the class that called this function (enables fluent design).

Return type:

(QueryMatcher)

Module contents