Replacement and Pattern Matching¶
We wish to define an API method on the HUGR that allows replacement of a specified subgraph with a specified replacement graph.
More ambitiously, we also wish to facilitate pattern-matching on the HUGR.
Replacement¶
Definitions¶
If n is either a DFG or a CFG node, a set S of nodes in the sibling graph under n is convex (DFG-convex or CFG-convex respectively) if every node on every path in the sibling graph that starts and ends in S is itself in S.
The meaning of “convex” is: if A and B are nodes in the convex set S, then any sibling node on a path from A to B is also in S.
API methods¶
There are the following primitive operations.
Replacement methods¶
SimpleReplace¶
This method is used for simple replacement of dataflow subgraphs consisting of leaf nodes. It works by replacing a convex induced subgraph of the main Hugr with a replacement Hugr having the same signature.
To this end, we specify a SiblingSubgraph in terms of a set \(X\) of nodes of
the containing Hugr, an input signature \(I\) and an output signature \(O\). \(I\) is
represented as a vector of vectors of input ports of nodes of \(X\), where the
outer vector corresponds to the signature and each inner vector corresponds to
one or more copies of a given input into the subgraph (all connected to the
same output port of the containing Hugr). \(O\) is represented as a vector of
output ports of nodes in \(X\) (possibly with repeats, which correspond to
copies).
Given a SiblingSubgraph \(S = (X, I, O)\) of a Hugr \(H\), and a DFG-rooted Hugr
\(H^\prime\) with an input signature matching the outer vector of \(I\) and an
output signature matching \(O\), we can form a new Hugr by replacing the nodes of
\(X\) in \(H\) with the Hugr \(H^\prime\).
Replace¶
This is the general subgraph-replacement method. Intuitively, it takes a set of
sibling nodes to remove and replace with a new set of nodes. The new set of
nodes is itself a HUGR with some “holes” (edges and nodes that get “filled in”
by the Replace operation). To fully specify the operation, some further data
are needed:
The replacement may include container nodes with no children, which adopt the children of removed container nodes and prevent those children being removed.
All new incoming edges from the retained nodes to the new nodes, all outgoing edges from the new nodes to the retained nodes, and any new edges that bypass the replacement (going between retained nodes) must be specified.
Given a set \(S\) of nodes in a hugr, let \(S^\*\) be the set of all nodes descended from nodes in \(S\) (i.e. reachable from \(S\) by following hierarchy edges), including \(S\) itself.
A NewEdgeSpec specifies an edge inserted between an existing node and a new node.
It contains the following fields:
SrcNode: the source node of the new edge.TgtNode: the target node of the new edge.EdgeKind: may beValue,Order,StaticorControlFlow.SrcPos: forValueandStaticedges, the position of the source port; forControlFlowedges, the position among the outgoing edges.TgtPos: (forValueandStaticedges only) the desired position among the incoming ports to the new node.
The Replace method takes as input:
the ID of a container node \(P\) in \(\Gamma\);
a set \(S\) of IDs of nodes that are children of \(P\)
a Hugr \(G\) whose root is a node of the same type as \(P\). Note this Hugr need not be valid, in that it may be missing:
edges to/from some ports (i.e. it may have unconnected ports)—not just Copyable dataflow outputs, which may occur even in valid Hugrs, but also incoming and/or non-Copyable dataflow ports, and ControlFlow ports,
all children for some container nodes strictly beneath the root (i.e. it may have container nodes with no outgoing hierarchy edges)
some children of the root, for container nodes that require particular children (e.g. \(\mathtt{Input}\) and/or \(\mathtt{Output}\) if \(P\) is a dataflow container, the exit node of a CFG, the required number of children of a conditional)
a map \(B\) from container nodes in \(G\) that have no children to container nodes in \(S^\*\) none of which is an ancestor of another. Let \(X\) be the set of children of nodes in the image of \(B\), and \(R = S^\* \setminus X^\*\).
a list \(\mu\_\textrm{inp}\) of
NewEdgeSpecwhich all have theirTgtNodein \(G\) andSrcNodein \(\Gamma \setminus R\);a list \(\mu\_\textrm{out}\) of
NewEdgeSpecwhich all have theirSrcNodein \(G\) andTgtNodein \(\Gamma \setminus R\), whereTgtNodeandTgtPosdescribe an existing incoming edge of that kind from a node in \(S^\*\).a list \(\mu\_\textrm{new}\) of
NewEdgeSpecwhich all have bothSrcNodeandTgtNodein \(\Gamma \setminus R\), whereTgtNodeandTgtPosdescribe an existing incoming edge of that kind from a node in \(S^\*\).
Note that considering all three \(\mu\) lists together,
the
TgtNode+TgtPoss of allNewEdgeSpecs withEdgeKind==Valuewill be uniqueand similarly for
EdgeKind==Static
The well-formedness requirements of Hugr imply that \(\mu\_\textrm{inp}\),
\(\mu\_\textrm{out}\) and \(\mu\_\textrm{new}\) may only contain NewEdgeSpecs with
certain EdgeKinds, depending on \(P\):
if \(P\) is a dataflow container,
EdgeKinds may beOrder,ValueorStaticonly (noControlFlow)if \(P\) is a CFG node,
EdgeKinds may beControlFlow,Value, orStaticonly (noOrder)if \(P\) is a Module node, there may be
ValueorStaticonly (noOrder).
(in the case of \(P\) being a CFG or Module node, any Value edges will be nonlocal, like Static edges.)
The new hugr is then derived as follows:
Make a copy in \(\Gamma\) of all the nodes in \(G\) except the root, and all edges except hierarchy edges from the root.
For each \(\sigma\_\mathrm{inp} \in \mu\_\textrm{inp}\), insert a new edge going into the new copy of the
TgtNodeof \(\sigma\_\mathrm{inp}\) according to the specification \(\sigma\_\mathrm{inp}\). Where these edges are from ports that currently have edges to nodes in \(R\), the existing edges are replaced.For each \(\sigma\_\mathrm{out} \in \mu\_\textrm{out}\), insert a new edge going out of the new copy of the
SrcNodeof \(\sigma\_\mathrm{out}\) according to the specification \(\sigma\_\mathrm{out}\). ForValueor Static edges, the target port must have an existing edge whose source is in \(R\); this edge is removed.For each \(\sigma\_\mathrm{new} \in \mu\_\textrm{new}\), insert a new edge between the existing
SrcNodeandTgtNodein \(\Gamma\). ForValueor Static edges, the target port must have an existing edge whose source is in \(R\); this edge is removed.Let \(N\) be the ordered list of the copies made in \(\Gamma\) of the children of the root node of \(G\). For each child \(C\) of \(P\) (in order), if \(C \in S\), redirect the hierarchy edge \(P \rightarrow C\) to target the next node in \(N\). Stop if there are no more nodes in \(N\). Add any remaining nodes in \(N\) to the end of \(P\)’s list of children.
For each node \((n, b = B(n))\) and for each child \(m\) of \(b\), replace the hierarchy edge from \(b\) to \(m\) with a hierarchy edge from the new copy of \(n\) to \(m\) (preserving the order).
Remove all nodes in \(R\) and edges adjoining them. (Reindexing may be necessary after this step.)
Outlining methods¶
OutlineDFG¶
Replace a DFG-convex subgraph with a single DFG node having the original nodes as children.
OutlineCFG¶
Replace a set of CFG sibling nodes with a single BasicBlock node having a CFG node child which has as its children the set of BasicBlock nodes originally specified. The set of basic blocks must satisfy constraints:
There must be at most one node in the set with incoming (controlflow) edges from nodes outside the set. Specifically,
either the set includes the CFG’s entry node, and any edges from outside the set (there may be none or more) target said entry node;
or the set does not include the CFG’s entry node, but contains exactly one node which is the target of at least one edge(s) from outside the set.
The set may not include the Exit block.
There must be exactly one edge from a node in the set to a node outside it.
Situations in which multiple nodes have edges leaving the set, or where the Exit block would be in the set, can be converted to this form by a combination of InsertIdentity operations and one Replace. For example, rather than moving the Exit block into the nested CFG:
An Identity node with a single successor can be added onto each edge into the Exit
If there is more than one edge into the Exit, these Identity nodes can then all be combined by a Replace operation changing them all for a single Identity (keeping the same number of in-edges, but reducing to one out-edge to the Exit).
The single edge to the Exit node can then be used as the exiting edge.
Inlining methods¶
These are the exact inverses of the above.
InlineDFG¶
Given a DFG node in a DSG, inline its children into the DSG.
InlineCFG¶
When a BasicBlock node has a single CFG node as a child, replace it with the children of that CFG node.
Identity insertion and removal methods¶
InsertIdentity¶
Given an edge between sibling nodes in a DSG, insert an identity<T>
node having its source as predecessor and its target as successor.
RemoveIdentity¶
Remove an identity<T> node from a DSG, wiring its predecessor to its
successor.
Order insertion and removal methods¶
InsertOrder¶
Insert an Order edge from n0 to n1 where n0 and n1 are distinct
siblings in a DSG such that there is no path in the DSG from n1 to
n0. (Thus acyclicity is preserved.) If there is already an order edge from
n0 to n1 this does nothing (but is not an error).
RemoveOrder¶
Given nodes n0 and n1, if there is an Order edge from n0 to n1,
remove it. (If there is an non-local edge from n0 to a descendent of
n1, this invalidates the hugr. TODO should this be an error?)
Insertion and removal of const loads¶
InsertConstIgnore¶
Given a Const<T> node c, and optionally P, a parent of a DSG, add a new
LoadConstant<T> node n as a child of P with a Const<T> edge
from c to n and no outgoing edges from n. Return the ID of n. If P is
omitted it defaults to the parent of c (in this case said c will
have to be in a DSG or CSG rather than under the Module Root.) If P is
provided, it must be a descendent of the parent of c.
RemoveLoadConstant¶
Given a LoadConstant<T> node n that has no outgoing edges, remove
it (and its incoming Static edge and any Order edges) from the hugr.
Insertion and removal of const nodes¶
InsertConst¶
Given a Const<T> node c and a container node P (either a Module,
a CFG node or a dataflow container), add c as a child of P.
RemoveConst¶
Given a Const<T> node c having no outgoing edges, remove c.
Usage¶
Note that we can only reattach children into empty replacement containers. This simplifies the API, and is not a serious restriction since we can use the outlining and inlining methods to target a group of nodes.
The most basic case – replacing a convex set of Op nodes in a DSG with
another graph of Op nodes having the same signature – is implemented by
SimpleReplace.
If one of the nodes in the region is a complex container node that we wish to preserve in the replacement without doing a deep copy, we can use an empty node in the replacement and have B map this node to the old one.
We can, for example, implement “turning a Conditional-node with known
Sum into a DFG-node” by a Replace where the Conditional (and its
preceding Sum) is replaced by an empty DFG and the map B specifies
the “good” child of the Conditional as the surrogate parent of the new
DFG’s children. (If the good child was just an Op, we could either
remove it and include it in the replacement, or – to avoid this overhead
– outline it in a DFG first.)
Similarly, replacement of a CFG node having a single BasicBlock child
with a DFG node can be achieved using Replace (specifying the
BasicBlock node as the surrogate parent for the new DFG’s children).
Arbitrary node insertion on Dataflow edges can be achieved using
InsertIdentity followed by Replace. Removal of a node in a DSG
having input wires and output wires of the same type can be achieved
using Replace (with a set of identity<T> nodes) followed by
RemoveIdentity.
Normalisation¶
We envisage that some kind of pass can be used after a rewrite or series of rewrites to automatically apply RemoveLoadConstant for any unused load_constants, and other such tidies. This might be global, or by tracking which parts of the Hugr have been touched.
Metadata updates on replacement¶
When requesting a replacement on Γ the caller may optionally provide metadata for the nodes of Γ and Γ’. Upon replacement, the metadata for the nodes in Γ are updated with the metadata for the nodes of Γ’ that replace them. (When child nodes are rewired, they keep their existing metadata.)
The fate of the metadata of nodes in S depends on the policy specified, as described below.
The caller may also specify a basic regular
expression
(or some other textual pattern format TBD) specifying which keys of
metadata to transfer (e.g. Foo, or * for all metadata, or Debug_*
for all metadata keyed by a string beginning with Debug_).
If no policy is specified, the default is to forget all metadata attached to the replaced subgraph (except for rewired child nodes).
Other policies could include:
to each of the new nodes added, insert a piece of metadata in its
Historysection that captures all the chosen metadata with the keys of the replaced nodes:History: {Replaced: [{node1: old_node1_metadata, node2: old_node2_metadata, ...}, {...}, ...]}whereReplacedspecifies an ordered list of replacements, and the new replacement appends to the list (or creates a new list ifReplaceddoesn’t yet exist);
to the root node of Γ, attach metadata capturing a serialization of the replacement (both the set of nodes replaced and its replacement):
History: {Replacements: [...]}
Further policies may be defined in the future; none of these polices (except for the default forgetful policy) form part of this specification.
Pattern matching¶
We would like to be able to find all subgraphs of a HUGR matching a given pattern. Exactly how the pattern is specified, and the details of the algorithm, are not discussed here; but we assume that we have an implementation of such an algorithm that works on flat (non-hierarchical) port-graphs.
It can be applied separately to each DSG within the HUGR, matching the various node types within it. Starting from the root node, we can recurse down to other DSGs within the HUGR.
It should also be possible to specify a particular DSG on which to run the pattern matching, by supplying its parent node ID.
Patterns matching edges that traverse DSGs are also possible, but will be implemented in terms of the above replacement operations, making use of the child-rewiring lists.