Commit 69f89f4b authored by Eric Myhre's avatar Eric Myhre

traversal as a package! Big diff; docs!, etc.

All traversal/transform/selector/path code to date got a rethink.

For the longest time, I was trying to keep all the traversal code in
the main root IPLD package, thinking/hoping that the number of relevant
functions would be minimal enough that this would be fine.

Somewhere around the time where I conceded that yes, we *do* need
separate methods for writable and readonly paths (it's true that
readonly is a degenerate case of writable -- but they have different
performance characteristics!), I gave up: the number of functions in
play is too high to heap alltogether in the root package namespace.

This leads to a number of other surprising results: namely, that
ipld.Path *isn't* -- it's actually traversal.Path!  We *don't use*
paths anywhere else except as guidance to some forms of traversal,
and logs and progress markers during traversal.  Huh!
Despite being initially surprising, I now regard this as a *good*
code smell.

ipld.Traversal as an interface and ipld.Path.Traverse are dropped.
These turned out to be the wrong level of abstraction:
it's both missing things, and not able to return enough things.
By the time we would fix both, it's not an abstraction anymore.

Path.Traverse didn't know how to use a link loader func -- and
that's correct; it shouldn't -- and it also didn't retain and
return a stack of Nodes it traversed -- which again, it shouldn't,
because in general that would be useless overkill and resource waste...
it's just that both these things are essential in some use cases.
So!  Cut that gordian knot.

The Focus/FocusTransform/Traverse/TraverseTransform methods will now
do all recursion work themselves, internally.  The transform family
will keep stacks of nodes encountered along the way, to aid in the
(presumably) coming tree re-build; the focus family won't.
All of these methods will be able to support link loading, though
implementation of that is still upcoming.

There's a few more fields in TraversalProgress, also preparing for when
we have automatic link loading and traversal: we'll want to be able
to inform the user's callbacks about load edges.  (I'm not a fan of the
term 'block' here, but running with it for the moment.)

TraversalConfig is expected to grow.  Namely, to include link
loaders... and subsequently, probably block writers, as well!  This is
a big subject, though.  It'll probably take several commits and
possibly more than one iteration over time to hammer out a good API.

One thing that's interesting to note that neither TraversalConfig nor
TraversalProgress will grow to contain certain things: for example,
a map of "seen" CIDs when doing link traversal.  Why?  Well, if you've
been mentally modelling things as a graph, this is surprising; surely
any graph traversal needs to remember a set of already-visited nodes.
The trick is... we're *not* doing a graph traversal -- not exactly!
Since our visitor gets a (Node,Path) *tuple*... it would be wrong to
memoize on anything other than the complete tuple; and the Path part
of the tuple means we're actually doing a *tree* walk rather than
a graph walk.  (And of course, since we're operating on DAGs,
termination guarantees are already a non-issue.)  Tada; no "seen" set.
Signed-off-by: default avatarEric Myhre <hash@exultant.us>
parent 3aa48459
package ipld
// Transform traverses an ipld.Node graph and applies a function
// to the reached node.
//
// The applicative function must return a new Node; if the returned value is
// not equal to the original reached node, the reached node will be replaced
// with the new Node, and the Transform function as a whole will return a new
// node which is in the comparable graph position to the node the Transform
// was originally launched from.
// If this update takes place deep in the graph, new intermediate nodes will
// be constructed as necessary to propagate the changes in a copy-on-write
// fashion.
// (If the returned value is identical to the original reached node, there
// is no update; and the final value returned from Transform will also be
// identical to the starting value.)
//
// Transform can be used again inside the applicative function!
// This kind of composition can be useful for doing batches of updates.
// E.g. if have a large Node graph which contains a 100-element list, and
// you want to replace elements 12, 32, and 95 of that list:
// then you should Transform to the list first, and inside that applicative
// function's body, you can replace the entire list with a new one
// that is composed of copies of everything but those elements -- including
// using more Transform calls as desired to produce the replacement elements
// if it so happens that those replacement elements are easiest to construct
// by regarding them as incremental updates to the previous values.
//
// Note that anything you can do with the Transform function, you can also
// do with regular Node and NodeBuilder usage directly. Transform just
// does a large amount of the intermediate bookkeeping that's useful when
// creating new values which are partial updates to existing values.
func Transform(
node Node,
path Path,
applicative func(reachedNode Node, reachedPath Path) (reachedNodeReplacement Node),
) (nodeReplacement Node, err error) {
return TransformUsingTraversal(node, path.Traverse, applicative)
}
// TransformUsingTraversal is identical to Transform, but accepts a generic
// ipld.Traversal function instead of an ipld.Path for guiding its selection
// of a node to transform.
func TransformUsingTraversal(
node Node,
traversal Traversal,
applicative func(reachedNode Node, reachedPath Path) (reachedNodeReplacement Node),
) (nodeReplacement Node, err error) {
panic("TODO") // TODO
}
// ContinueTransform is similar to Transform, but takes an additional parameter
// in order to keep Path information complete when doing nested Transforms.
//
// Use ContinueTransform in the body of the applicative function of a Transform
// (or ContinueTransform) call: providing the so-far reachedPath as the nodePath
// to the ContinueTransform call will make sure the next, deeper applicative
// call will get a reachedPath which is the complete path all the way from the
// root node of the outermost transform.
//
// (Or, ignore all this and use Transform nested bare. It's your own Path
// information you're messing with; if you feel like your algorithm would
// work better seeing a locally scoped path rather than a more globally
// rooted one, that's absolutely fine.)
func FurtherTransform(
node Node,
nodePath Path,
path Path,
applicative func(reachedNode Node, reachedPath Path) (reachedNodeReplacement Node),
) (nodeReplacement Node, err error) {
panic("vanishing shortly")
}
package ipld
import "context"
// Traversal is an applicative function which takes one node and returns another,
// while also returning a Path describing a way to repeat the traversal, and
// an error if any part of the traversal failed.
//
// Traversal requires a TraversalProgress argument (which may be zero-valued),
// and returns a new TraversalProgress containing an updated Path.
//
// The most common type of Traversal is ipld.Path.Traversal, but it's possible
// to implement other kinds of Traversal function: for example, one could
// implement a traversal algorithm which performs some sort of search to
// select a target node (rather than knowing where it's going before it
// starts, as Path.Traversal does).
//
// In the case of error, the returned TraversalProgress may be zero and the
// Node may be nil. (The particular Path at which the error was encountered
// may be encoded in the error type.)
type Traversal func(tp TraversalProgress, start Node) (tp2 TraversalProgress, finish Node, err error)
type TraversalProgress struct {
Ctx context.Context
Path Path
}
// This package provides functional utilities for traversing and transforming
// IPLD nodes.
//
// The traversal.Path type provides a description of how to perform
// several steps across a Node tree. These are dual purpose:
// Paths can be used as instructions to do some traversal, and
// Paths are accumulated during traversals as a log of progress.
//
// "Focus" functions provide syntactic sugar for using ipld.Path to jump
// to a Node deep in a tree of other Nodes.
//
// "FocusTransform" functions can the same such deep jumps, and support
// mutation as well!
// (Of course, since ipld.Node is an immutable interface, more precisely
// speaking, "transformations" are implemented rebuilding trees of nodes to
// emulate mutation in a copy-on-write way.)
//
// "Traverse" functions perform a walk of a Node graph, and apply visitor
// functions multiple Nodes. Traverse can be guided by Selectors,
// which are a very general and extensible mechanism for filtering which
// Nodes are of interest, as well as guiding the traversal.
// (See the selector sub-package for more detail.)
//
// "TraverseTransform" is similar to Traverse, but with support for mutations.
//
// All of these functions -- the "Focus*" and "Traverse*" family alike --
// work via callbacks: they do the traversal, and call a user-provided function
// with a handle to the reached Node. Traversals and Focuses can be used
// recursively within this callback.
//
// All of these functions -- the "Focus*" and "Traverse*" family alike --
// include support for automatic resolution and loading of new Node trees
// whenever IPLD Links are encountered. This can be configured freely
// by providing LinkLoader interfaces in TraversalConfig.
// (TODO.)
//
// Some notes on the limits of usage:
//
// The Transform family of methods is most appropriate for patterns of usage
// which resemble point mutations.
// More general transformations -- zygohylohistomorphisms, etc -- will be best
// implemented by composing the read-only systems (e.g. Focus, Traverse) and
// handling the accumulation in the visitor functions.
//
// (Why? The "point mutation" use-case gets core library support because
// it's both high utility and highly clear how to implement it.
// More advanced transformations are nontrivial to provide generalized support
// for, for three reasons: efficiency is hard; not all existing research into
// categorical recursion schemes is necessarily applicable without modification
// (efficient behavior in a merkle-tree context is not the same as efficient
// behavior on uniform memory!); and we have the further compounding complexity
// of the range of choices available for underlying Node implementation.
// Therefore, attempts at generalization are not included here; handling these
// issues in concrete cases is easy, so we call it an application logic concern.
// However, exploring categorical recursion schemes as a library is encouraged!)
//
package traversal
package traversal
import (
"context"
cid "github.com/ipfs/go-cid"
ipld "github.com/ipld/go-ipld-prime"
)
// VisitFn is a read-only visitor.
type VisitFn func(TraversalProgress, ipld.Node) error
// TransformFn is like a visitor that can also return a new Node to replace the visited one.
type TransformFn func(TraversalProgress, ipld.Node) (ipld.Node, error)
// AdvVisitFn is like VisitFn, but for use with AdvTraversal: it gets additional arguments describing *why* this node is visited.
type AdvVisitFn func(TraversalProgress, ipld.Node, TraversalReason) (ipld.Node, error)
// TraversalReason provides additional information to traversals using AdvVisitFn.
type TraversalReason byte // enum = SelectionMatch | SelectionParent | SelectionCandidate // probably only pointful for block edges?
type TraversalProgress struct {
*TraversalConfig
Path Path // Path is how we reached the current point in the traversal.
LastBlock struct { // LastBlock stores the Path and CID of the last block edge we had to load. (It will always be zero in traversals with no linkloader.)
Path
cid.Cid
}
}
type TraversalConfig struct {
Ctx context.Context // Context carried through a traversal. Optional; use it if you need cancellation.
}
package traversal
import (
"fmt"
"strconv"
ipld "github.com/ipld/go-ipld-prime"
)
// Focus is a shortcut for kicking off
// TraversalProgress.Focus with an empty initial state
// (e.g. the Node given here is the "root" node of your operation).
func Focus(n ipld.Node, p Path, fn VisitFn) error {
return TraversalProgress{}.Focus(n, p, fn)
}
// FocusedTransform is a shortcut for kicking off
// TraversalProgress.FocusedTransform with an empty initial state
// (e.g. the Node given here is the "root" node of your operation).
func FocusedTransform(n ipld.Node, p Path, fn TransformFn) (ipld.Node, error) {
return TraversalProgress{}.FocusedTransform(n, p, fn)
}
// Focus traverses an ipld.Node graph, reaches a single Node,
// and applies a function to the reached node.
//
// Focus is a read-only traversal.
// See FocusedTransform if looking for a way to do an "update" to a Node.
//
// Focus can be used again again inside the applied VisitFn!
// By using the TraversalProgress handed to the VisitFn, the traversal Path
// so far will continue to be extended, so continued nested uses of Focus
// will see a fully contextualized Path.
func (tp TraversalProgress) Focus(n ipld.Node, p Path, fn VisitFn) error {
for i, seg := range p.segments {
switch n.Kind() {
case ipld.ReprKind_Invalid:
return fmt.Errorf("cannot traverse node at %q: it is undefined", Path{p.segments[0:i]})
case ipld.ReprKind_Map:
next, err := n.TraverseField(seg)
if err != nil {
return fmt.Errorf("error traversing node at %q: %s", Path{p.segments[0:i]}, err)
}
n = next
case ipld.ReprKind_List:
intSeg, err := strconv.Atoi(seg)
if err != nil {
return fmt.Errorf("cannot traverse node at %q: the next path segment (%q) cannot be parsed as a number and the node is a list", Path{p.segments[0:i]}, seg)
}
next, err := n.TraverseIndex(intSeg)
if err != nil {
return fmt.Errorf("error traversing node at %q: %s", Path{p.segments[0:i]}, err)
}
n = next
case ipld.ReprKind_Link:
panic("NYI link loading") // TODO
// this would set a progress marker in `tp` as well
default:
return fmt.Errorf("error traversing node at %q: %s", Path{p.segments[0:i]}, fmt.Errorf("cannot traverse terminals"))
}
}
tp.Path = tp.Path.Join(p)
return fn(tp, n)
}
// FocusedTransform traverses an ipld.Node graph, reaches a single Node,
// and applies a function to the reached node which make return a new Node.
//
// If the TransformFn returns a Node which is the same as the original
// reached node, the transform is a no-op, and the Node returned from the
// FocusedTransform call as a whole will also be the same as its starting Node.
//
// Otherwise, the reached node will be "replaced" with the new Node -- meaning
// that new intermediate nodes will be constructed to also replace each
// parent Node that was traversed to get here, thus propagating the changes in
// a copy-on-write fashion -- and the FocusedTransform function as a whole will
// return a new Node containing identical children except for those replaced.
//
// FocusedTransform can be used again inside the applied function!
// This kind of composition can be useful for doing batches of updates.
// E.g. if have a large Node graph which contains a 100-element list, and
// you want to replace elements 12, 32, and 95 of that list:
// then you should FocusedTransform to the list first, and inside that
// TransformFn's body, you can replace the entire list with a new one
// that is composed of copies of everything but those elements -- including
// using more TransformFn calls as desired to produce the replacement elements
// if it so happens that those replacement elements are easiest to construct
// by regarding them as incremental updates to the previous values.
//
// Note that anything you can do with the Transform function, you can also
// do with regular Node and NodeBuilder usage directly. Transform just
// does a large amount of the intermediate bookkeeping that's useful when
// creating new values which are partial updates to existing values.
func (tp TraversalProgress) FocusedTransform(n ipld.Node, p Path, fn TransformFn) (ipld.Node, error) {
panic("TODO") // TODO surprisingly different from Focus -- need to store nodes we traversed, and able do building.
}
package ipld
package traversal
import (
"fmt"
"strconv"
"strings"
)
var (
_ Traversal = Path{}.Traverse // (type assertion)
ipld "github.com/ipld/go-ipld-prime"
)
// Path represents a MerklePath. TODO:standards-doc-link.
//
// Paths are used in describing progress in a traversal;
// and can also be used as an instruction for a specific traverse.
//
// IPLD Paths can only go down: that is, each segment must traverse one node.
// There is no ".." which means "go up";
// and there is no "." which means "stay here";
......@@ -72,24 +73,33 @@ func (p Path) Join(p2 Path) Path {
return p
}
// Path.Traverse is an implementation of Traversal that makes a simple
// direct walk over a sequence of nodes, using each segment of the path
// to get the next node until all path segments have been consumed.
// Parent returns a path with the last of its segments popped off (or
// the zero path if it's already empty).
func (p Path) Parent() Path {
if len(p.segments) == 0 {
return Path{}
}
return Path{p.segments[0 : len(p.segments)-1]}
}
// traverse makes a simple direct walk over a sequence of nodes,
// using each segment of the path to get the next node,
// proceding until all path segments have been consumed.
//
// If one of the node traverse steps returns an error, that node and the
// path so far including that node will be returned, as well as the error.
func (p Path) Traverse(tp TraversalProgress, start Node) (_ TraversalProgress, reached Node, err error) {
// This method may be removed. It doesn't know about link loading;
// and this limits its usefulness.
func (p Path) traverse(tp TraversalProgress, start ipld.Node) (_ TraversalProgress, reached ipld.Node, err error) {
for i, seg := range p.segments {
switch start.Kind() {
case ReprKind_Invalid:
case ipld.ReprKind_Invalid:
return TraversalProgress{}, nil, fmt.Errorf("cannot traverse node at %q: it is undefined", Path{p.segments[0:i]})
case ReprKind_Map:
case ipld.ReprKind_Map:
next, err := start.TraverseField(seg)
if err != nil {
return TraversalProgress{}, nil, fmt.Errorf("error traversing node at %q: %s", Path{p.segments[0:i]}, err)
}
start = next
case ReprKind_List:
case ipld.ReprKind_List:
intSeg, err := strconv.Atoi(seg)
if err != nil {
return TraversalProgress{}, nil, fmt.Errorf("cannot traverse node at %q: the next path segment (%q) cannot be parsed as a number and the node is a list", Path{p.segments[0:i]}, seg)
......
package ipld_test
package traversal
import (
"fmt"
......@@ -6,7 +6,6 @@ import (
. "github.com/warpfork/go-wish"
ipld "github.com/ipld/go-ipld-prime"
ipldfree "github.com/ipld/go-ipld-prime/impl/free"
)
......@@ -17,10 +16,10 @@ func TestPathTraversal(t *testing.T) {
n0.SetString("asdf")
n.SetIndex(0, n0)
tp, nn, e := ipld.ParsePath("0").Traverse(ipld.TraversalProgress{}, n)
tp, nn, e := ParsePath("0").traverse(TraversalProgress{}, n)
Wish(t, nn, ShouldEqual, n0)
Wish(t, tp.Path, ShouldEqual, ipld.ParsePath("0"))
Wish(t, tp.Path, ShouldEqual, ParsePath("0"))
Wish(t, e, ShouldEqual, nil)
})
t.Run("traversing map", func(t *testing.T) {
......@@ -29,10 +28,10 @@ func TestPathTraversal(t *testing.T) {
n0.SetString("asdf")
n.SetField("foo", n0)
tp, nn, e := ipld.ParsePath("foo").Traverse(ipld.TraversalProgress{}, n)
tp, nn, e := ParsePath("foo").traverse(TraversalProgress{}, n)
Wish(t, nn, ShouldEqual, n0)
Wish(t, tp.Path, ShouldEqual, ipld.ParsePath("foo"))
Wish(t, tp.Path, ShouldEqual, ParsePath("foo"))
Wish(t, e, ShouldEqual, nil)
})
t.Run("traversing deeper", func(t *testing.T) {
......@@ -45,10 +44,10 @@ func TestPathTraversal(t *testing.T) {
n0.SetIndex(1, n01)
n.SetField("foo", n0)
tp, nn, e := ipld.ParsePath("foo/1/bar").Traverse(ipld.TraversalProgress{}, n)
tp, nn, e := ParsePath("foo/1/bar").traverse(TraversalProgress{}, n)
Wish(t, nn, ShouldEqual, n010)
Wish(t, tp.Path, ShouldEqual, ipld.ParsePath("foo/1/bar"))
Wish(t, tp.Path, ShouldEqual, ParsePath("foo/1/bar"))
Wish(t, e, ShouldEqual, nil)
})
t.Run("traversal error on unexpected terminals", func(t *testing.T) {
......@@ -62,17 +61,17 @@ func TestPathTraversal(t *testing.T) {
n.SetField("foo", n0)
t.Run("deep terminal", func(t *testing.T) {
tp, nn, e := ipld.ParsePath("foo/1/bar/drat").Traverse(ipld.TraversalProgress{}, n)
tp, nn, e := ParsePath("foo/1/bar/drat").traverse(TraversalProgress{}, n)
Wish(t, nn, ShouldEqual, nil)
Wish(t, tp.Path, ShouldEqual, ipld.Path{})
Wish(t, tp.Path, ShouldEqual, Path{})
Wish(t, e, ShouldEqual, fmt.Errorf(`error traversing node at "foo/1/bar": cannot traverse terminals`))
})
t.Run("immediate terminal", func(t *testing.T) {
tp, nn, e := ipld.ParsePath("drat").Traverse(ipld.TraversalProgress{}, n010)
tp, nn, e := ParsePath("drat").traverse(TraversalProgress{}, n010)
Wish(t, nn, ShouldEqual, nil)
Wish(t, tp.Path, ShouldEqual, ipld.Path{})
Wish(t, tp.Path, ShouldEqual, Path{})
Wish(t, e, ShouldEqual, fmt.Errorf(`error traversing node at "": cannot traverse terminals`))
})
})
......@@ -86,10 +85,10 @@ func TestPathTraversal(t *testing.T) {
n0.SetIndex(1, n01)
n.SetField("foo", n0)
tp, nn, e := ipld.ParsePath("foo/1/drat").Traverse(ipld.TraversalProgress{}, n)
tp, nn, e := ParsePath("foo/1/drat").traverse(TraversalProgress{}, n)
Wish(t, nn, ShouldEqual, nil)
Wish(t, tp.Path, ShouldEqual, ipld.Path{})
Wish(t, tp.Path, ShouldEqual, Path{})
Wish(t, e, ShouldEqual, fmt.Errorf(`error traversing node at "foo/1": 404`))
})
}
package ipld
package traversal
import (
"testing"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment