hatchet package¶
Subpackages¶
Submodules¶
hatchet.frame module¶
hatchet.graph module¶
-
class
hatchet.graph.
Graph
(roots)[source]¶ Bases:
object
A possibly multi-rooted tree or graph from one input dataset.
-
copy
(old_to_new=None)[source]¶ Create and return a copy of this graph.
Parameters: old_to_new (dict, optional) – if provided, this dictionary will be populated with mappings from old node -> new node
-
find_merges
()[source]¶ Find nodes that have the same parent and frame.
Find nodes that have the same parent and duplicate frame, and return a mapping from nodes that should be eliminated to nodes they should be merged into.
Returns: dictionary from nodes to their merge targets Return type: (dict)
-
static
from_lists
(*roots)[source]¶ Convenience method to invoke Node.from_lists() on each root value.
-
merge_nodes
(merges)[source]¶ Merge some nodes in a graph into others.
merges
is a dictionary keyed by old nodes, with values equal to the nodes that they need to be merged into. Old nodes’ parents and children are connected to the new node.Parameters: merges (dict) – dictionary from source nodes -> targets
-
traverse
(order='pre', attrs=None, visited=None)[source]¶ Preorder traversal of all roots of this Graph.
Parameters: attrs (list or str, optional) – If provided, extract these fields from nodes while traversing and yield them. See traverse()
for details.Only preorder traversal is currently supported.
-
union
(other, old_to_new=None)[source]¶ Create the union of self and other and return it as a new Graph.
This creates a new graph and does not modify self or other. The new Graph has entirely new nodes.
Parameters: - other (Graph) – another Graph
- old_to_new (dict, optional) – if provided, this dictionary will be populated with mappings from old node -> new node
Returns: new Graph containing all nodes and edges from self and other
Return type: (Graph)
-
hatchet.graphframe module¶
-
exception
hatchet.graphframe.
EmptyFilter
[source]¶ Bases:
Exception
Raised when a filter would otherwise return an empty GraphFrame.
-
class
hatchet.graphframe.
GraphFrame
(graph, dataframe, exc_metrics=None, inc_metrics=None)[source]¶ Bases:
object
An input dataset is read into an object of this type, which includes a graph and a dataframe.
-
add
(other, *args, **kwargs)[source]¶ Returns the column-wise sum of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
Returns: new graphframe Return type: (GraphFrame)
-
copy
()[source]¶ Return a shallow copy of the graphframe.
This copies the DataFrame, but the Graph is shared between self and the new GraphFrame.
-
div
(other, *args, **kwargs)[source]¶ Returns the column-wise float division of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
Returns: new graphframe Return type: (GraphFrame)
-
filter
(filter_obj, squash=True)[source]¶ Filter the dataframe using a user-supplied function.
Parameters: - filter_obj (callable, list, or QueryMatcher) – the filter to apply to the GraphFrame.
- squash (boolean, optional) – if True, automatically call squash for the user.
-
static
from_caliper
(filename, query)[source]¶ Read in a Caliper cali file.
Parameters: - filename (str) – name of a Caliper output file in .cali format
- query (str) – cali-query in CalQL format
-
static
from_caliper_json
(filename_or_stream)[source]¶ Read in a Caliper cali-query JSON-split file or an open file object.
Parameters: filename_or_stream (str or file-like) – name of a Caliper JSON-split output file, or an open file object to read one
-
static
from_cprofile
(filename)[source]¶ Read in a pstats/prof file generated using python’s cProfile.
-
static
from_hpctoolkit
(dirname)[source]¶ Read an HPCToolkit database directory into a new GraphFrame.
Parameters: dirname (str) – parent directory of an HPCToolkit experiment.xml file Returns: new GraphFrame containing HPCToolkit profile data Return type: (GraphFrame)
-
static
from_lists
(*lists)[source]¶ Make a simple GraphFrame from lists.
This creates a Graph from lists (see
Graph.from_lists()
) and uses it as the index for a new GraphFrame. Every node in the new graph has exclusive time of 1 and inclusive time is computed automatically.
-
groupby_aggregate
(groupby_function, agg_function)[source]¶ Groupby-aggregate dataframe and reindex the Graph.
Reindex the graph to match the groupby-aggregated dataframe.
Update the frame attributes to contain those columns in the dataframe index.
Parameters: - self (graphframe) – self’s graphframe
- groupby_function – groupby function on dataframe
- agg_function – aggregate function on dataframe
Returns: new graphframe with reindexed graph and groupby-aggregated dataframe
Return type:
-
mul
(other, *args, **kwargs)[source]¶ Returns the column-wise float multiplication of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
Returns: new graphframe Return type: (GraphFrame)
-
squash
()[source]¶ Rewrite the Graph to include only nodes present in the DataFrame’s rows.
This can be used to simplify the Graph, or to normalize Graph indexes between two GraphFrames.
-
sub
(other, *args, **kwargs)[source]¶ Returns the column-wise difference of two graphframes as a new graphframe.
This graphframe is the union of self’s and other’s graphs, and does not modify self or other.
Returns: new graphframe Return type: (GraphFrame)
-
subgraph_sum
(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]¶ Compute sum of elements in subgraphs.
For each row in the graph,
out_columns
will contain the element-wise sum of all values incolumns
for that row’s node and all of its descendants.This algorithm is worst-case quadratic in the size of the graph, so we try to call
subtree_sum
if we can. In general, there is not a particularly efficient algorithm known for subgraph sums, so this does about as well as we know how.Parameters: - columns (list of str) – names of columns to sum (default: all columns)
- out_columns (list of str) – names of columns to store results (default: in place)
- function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))
-
subtree_sum
(columns, out_columns=None, function=<function GraphFrame.<lambda>>)[source]¶ Compute sum of elements in subtrees. Valid only for trees.
For each row in the graph,
out_columns
will contain the element-wise sum of all values incolumns
for that row’s node and all of its descendants.This algorithm will multiply count nodes with in-degree higher than one – i.e., it is only correct for trees. Prefer using
subgraph_sum
(which callssubtree_sum
if it can), unless you have a good reason not to.Parameters: - columns (list of str) – names of columns to sum (default: all columns)
- out_columns (list of str) – names of columns to store results (default: in place)
- function (callable) – associative operator used to sum elements, sum of an all-NA series is NaN (default: sum(min_count=1))
-
to_dot
(metric='time', name='name', rank=0, thread=0, threshold=0.0)[source]¶ Write the graph in the graphviz dot format: https://www.graphviz.org/doc/info/lang.html
-
to_flamegraph
(metric='time', name='name', rank=0, thread=0, threshold=0.0)[source]¶ Write the graph in the folded stack output required by FlameGraph http://www.brendangregg.com/flamegraphs.html
-
to_literal
(name='name', rank=0, thread=0)[source]¶ Format this graph as a list of dictionaries for Roundtrip visualizations.
-
tree
(metric_column='time', precision=3, name_column='name', expand_name=False, context_column='file', rank=0, thread=0, depth=10000, highlight_name=False, invert_colormap=False)[source]¶ Format this graphframe as a tree and return the resulting string.
-
hatchet.node module¶
-
exception
hatchet.node.
MultiplePathError
[source]¶ Bases:
Exception
Raised when a node is asked for a single path but has multiple.
-
class
hatchet.node.
Node
(frame_obj, parent=None, hnid=-1, depth=-1)[source]¶ Bases:
object
A node in the graph. The node only stores its frame.
-
dag_equal
(other, vs=None, vo=None)[source]¶ Check if DAG rooted at self has the same structure as that rooted at other.
-
classmethod
from_lists
(lists)[source]¶ Construct a hierarchy of nodes from recursive lists.
For example, this will construct a simple tree:
Node.from_lists( ["a", ["b", "d", "e"], ["c", "f", "g"], ] )
a / \ b c / | | \ d e f g
And this will construct a simple diamond DAG:
d = Node(Frame(name="d")) Node.from_lists( ["a", ["b", d], ["c", d] ] )
a / \ b c \ / d
In the above examples, the ‘a’ represents a Node with its frame == Frame(name=”a”).
-
path
(attrs=None)[source]¶ Path to this node from root. Raises if there are multiple paths.
Parameters: attrs (str or list, optional) – attribute(s) to extract from Frames This is useful for trees (where each node only has one path), as it just gets the only element from
self.paths
. This will fail with a MultiplePathError if there is more than one path to this node.
-
paths
(attrs=None)[source]¶ List of tuples, one for each path from this node to any root.
Parameters: attrs (str or list, optional) – attribute(s) to extract from Frames Paths are tuples of Frame objects, or, if attrs is provided, they are paths containing the requested attributes.
-
traverse
(order='pre', attrs=None, visited=None)[source]¶ Traverse the tree depth-first and yield each node.
Parameters: - order (str) – “pre” or “post” for preorder or postorder (default: pre)
- attrs (list or str, optional) – if provided, extract these fields from nodes while traversing and yield them
- visited (dict, optional) – dictionary in which each visited node’s in-degree will be stored
-
hatchet.query_matcher module¶
-
exception
hatchet.query_matcher.
InvalidQueryFilter
[source]¶ Bases:
Exception
Raised when a query filter does not have a valid syntax
-
exception
hatchet.query_matcher.
InvalidQueryPath
[source]¶ Bases:
Exception
Raised when a query does not have the correct syntax
-
class
hatchet.query_matcher.
QueryMatcher
(query=None)[source]¶ Bases:
object
Process and apply queries to GraphFrames.
-
apply
(gf)[source]¶ Apply the query to a GraphFrame.
Parameters: gf (GraphFrame) – the GraphFrame on which to apply the query. Returns: A list of lists representing the set of paths that match this query. Return type: (list)
-
match
(wildcard_spec='.', filter_func=<function QueryMatcher.<lambda>>)[source]¶ Start a query with a root node described by the arguments.
Parameters: - wildcard_spec (str, optional, ".", "*", or "+") – the wildcard status of the node (follows standard Regex syntax)
- filter_func (callable, optional) – a callable accepting only a row from a Pandas DataFrame that is used to filter this node in the query
Returns: The instance of the class that called this function (enables fluent design).
Return type:
-
rel
(wildcard_spec='.', filter_func=<function QueryMatcher.<lambda>>)[source]¶ Add another edge and node to the query.
Parameters: - wildcard_spec (str, optional, ".", "*", or "+") – the wildcard status of the node (follows standard Regex syntax)
- filter_func (callable, optional) – a callable accepting only a row from a Pandas DataFrame that is used to filter this node in the query
Returns: The instance of the class that called this function (enables fluent design).
Return type:
-