Commit 2e1845b3 authored by Eric Myhre's avatar Eric Myhre

Several significant clarifications to Path docs.

Primarily, being commital and direct about how special values, erm...
*aren't* -- namely, "/" is a valid segment, and so is empty string --
and confessing how much tricky work that makes.

Several methods are now more upfront about how much they *do not do
good things* on such values.

Added some new constructors for Path which *do* work for all possible
values.  (You could've handled such tricky values with a series of
Join calls, previously, but that would be ergonomically grueling.)

These facts are all defacto 'true' to the best of my knowledge now.
However, the IPLD specs repo is a bit on the quiet side about them
at present (precisely because it's such an irritating little nest of
edge cases and fun stuff)... and so the comments are also littered
with "this may change" warnings.  Still, it's better to be accurate
about what the code does and does not do in its current state.

I'd *like* to formally specify an escaping system and canonical
string encoding for paths.  That should start in the IPLD Specs
repo, though, and involve fixtures, so I won't start it here now.

Thanks to @ribasushi for the kick in the shins that these docs needed
work and clarifications.
parent 907fb144
...@@ -4,37 +4,99 @@ import ( ...@@ -4,37 +4,99 @@ import (
"strings" "strings"
) )
// Path is used in describing progress in a traversal; // Path describes a series of steps across a tree or DAG of Node,
// and can also be used as an instruction for a specific traverse. // where each segment in the path is a map key or list index
// (literaly, Path is a slice of PathSegment values).
// Path is used in describing progress in a traversal; and
// can also be used as an instruction for traversing from one Node to another.
// Path values will also often be encountered as part of error messages.
// //
// (Note that Paths are useful as an instruction for traversing from
// *one* Node to *one* other Node; to do a walk from one Node and visit
// *several* Nodes based on some sort of pattern, look to IPLD Selectors,
// and the 'traversal/selector' package in this project.)
//
// Path values are always relative.
// Observe how 'traversal.Focus' requires both a Node and a Path argument --
// where to start, and where to go, respectively.
// Similarly, error values which include a Path will be speaking in reference
// to the "starting Node" in whatever context they arose from.
//
// The canonical form of a Path is as a list of PathSegment.
// Each PathSegment is a string; by convention, the string should be
// in UTF-8 encoding and use NFC normalization, but all operations
// will regard the string as its constituent eight-bit bytes.
//
// There are no illegal or magical characters in IPLD Paths
// (in particular, do not mistake them for UNIX system paths).
// IPLD Paths can only go down: that is, each segment must traverse one node. // IPLD Paths can only go down: that is, each segment must traverse one node.
// There is no ".." which means "go up"; // There is no ".." which means "go up";
// and there is no "." which means "stay here"; // and there is no "." which means "stay here".
// and it is not valid to have an empty path segment. // IPLD Paths have no magic behavior around characters such as "~".
// IPLD Paths do not have a concept of "globs" nor behave specially
// for a path segment string of "*" (but you may wish to see 'Selectors'
// for globbing-like features that traverse over IPLD data).
// //
// (Note: path strings as interpreted by UnixFS may certainly have concepts // An empty string is a valid PathSegment.
// of ".." and "."! But UnixFS is built upon IPLD; IPLD has no idea of this.) // (This leads to some unfortunate complications when wishing to represent
// paths in a simple string format; however, consider that maps do exist
// in serialized data in the wild where an empty string is used as the key:
// it is important we be able to correctly describe and address this!)
// //
// Paths are representable as strings. When represented as a string, each // A string of "/" is a valid PathSegment.
// segment is separated by a "/" character. // (As with empty strings, this is unfortunate (in particular, because it
// (It follows that path segments may not themselves contain a "/" character.) // very much doesn't match up well with expectations popularized by UNIX-like
// (Note: escaping may be specified and supported in the future; currently, it is not.) // filesystems); but, as with empty strings, maps which contain such a key
// certainly exist, and it is important that we be able to regard them!)
// //
// For an IPLD Path to be represented as a string, an encoding system
// including escaping is necessary. At present, there is not a single
// canonical specification for such an escaping; we expect to decide one
// in the future, but this is not yet settled and done.
// (This implementation has a 'String' method, but it contains caveats
// and may be ambiguous for some content. This may be fixed in the future.)
type Path struct { type Path struct {
segments []PathSegment segments []PathSegment
} }
// ParsePath converts a string to an IPLD Path, parsing the string into a segmented Path. // NewPath returns a Path composed of the given segments.
// //
// Each segment of the path string should be separated by a "/" character. // This constructor function does a defensive copy,
// in case your segments slice should mutate in the future.
// (Use NewPathNocopy if this is a performance concern,
// and you're sure you know what you're doing.)
func NewPath(segments []PathSegment) Path {
p := Path{make([]PathSegment, len(segments))}
copy(p.segments, segments)
return p
}
// NewPathNocopy is identical to NewPath but trusts that
// the segments slice you provide will not be mutated.
func NewPathNocopy(segments []PathSegment) Path {
return Path{segments}
}
// ParsePath converts a string to an IPLD Path, doing a basic parsing of the
// string using "/" as a delimiter to produce a segmented Path.
// This is a handy, but not a general-purpose nor spec-compliant (!),
// way to create a Path: it cannot represent all valid paths.
// //
// Multiple subsequent "/" characters will be silently collapsed. // Multiple subsequent "/" characters will be silently collapsed.
// E.g., `"foo///bar"` will be treated equivalently to `"foo/bar"`. // E.g., `"foo///bar"` will be treated equivalently to `"foo/bar"`.
// Prefixed and suffixed extraneous "/" characters are also discarded. // Prefixed and suffixed extraneous "/" characters are also discarded.
// This makes this constructor incapable of handling some possible Path values
// (specifically: paths with empty segements cannot be created with this constructor).
// //
// No "cleaning" of the path occurs. See the documentation of the Path struct; // There is no escaping mechanism used by this function.
// This makes this constructor incapable of handling some possible Path values
// (specifically, a path segment containing "/" cannot be created, because it
// will always be intepreted as a segment separator).
//
// No other "cleaning" of the path occurs. See the documentation of the Path struct;
// in particular, note that ".." does not mean "go up", nor does "." mean "stay here" -- // in particular, note that ".." does not mean "go up", nor does "." mean "stay here" --
// correspondingly, there isn't anything to "clean". // correspondingly, there isn't anything to "clean" in the same sense as
// 'filepath.Clean' from the standard library filesystem path packages would.
func ParsePath(pth string) Path { func ParsePath(pth string) Path {
// FUTURE: we should probably have some escaping mechanism which makes // FUTURE: we should probably have some escaping mechanism which makes
// it possible to encode a slash in a segment. Specification needed. // it possible to encode a slash in a segment. Specification needed.
...@@ -49,6 +111,14 @@ func ParsePath(pth string) Path { ...@@ -49,6 +111,14 @@ func ParsePath(pth string) Path {
// String representation of a Path is simply the join of each segment with '/'. // String representation of a Path is simply the join of each segment with '/'.
// It does not include a leading nor trailing slash. // It does not include a leading nor trailing slash.
//
// This is a handy, but not a general-purpose nor spec-compliant (!),
// way to reduce a Path to a string.
// There is no escaping mechanism used by this function,
// and as a result, not all possible valid Path values (such as those with
// empty segments or with segments containing "/") can be encoded unambiguously.
// For Path values containing these problematic segments, ParsePath applied
// to the string returned from this function may return a nonequal Path value.
func (p Path) String() string { func (p Path) String() string {
l := len(p.segments) l := len(p.segments)
if l == 0 { if l == 0 {
......
...@@ -15,10 +15,12 @@ import ( ...@@ -15,10 +15,12 @@ import (
// Internally, PathSegment will store either a string or an integer, // Internally, PathSegment will store either a string or an integer,
// depending on how it was constructed, // depending on how it was constructed,
// and will automatically convert to the other on request. // and will automatically convert to the other on request.
// (This means if two pieces of code communicate using PathSegment, one producing ints and the other expecting ints, they will work together efficiently.) // (This means if two pieces of code communicate using PathSegment,
// one producing ints and the other expecting ints,
// then they will work together efficiently.)
// PathSegment in a Path produced by ParsePath generally have all strings internally, // PathSegment in a Path produced by ParsePath generally have all strings internally,
// because there is distinction possible when parsing a Path string // because there is no distinction possible when parsing a Path string
// (and attempting to pre-parse all strings into ints "in case" would waste time in almost all cases). // (and attempting to pre-parse all strings into ints "just in case" would waste time in almost all cases).
type PathSegment struct { type PathSegment struct {
/* /*
A quick implementation note about the Go compiler and "union" semantics: A quick implementation note about the Go compiler and "union" semantics:
......
...@@ -13,13 +13,16 @@ func TestParsePath(t *testing.T) { ...@@ -13,13 +13,16 @@ func TestParsePath(t *testing.T) {
t.Run("parsing three segments", func(t *testing.T) { t.Run("parsing three segments", func(t *testing.T) {
Wish(t, ParsePath("0/foo/2").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "foo"}, {s: "2"}}) Wish(t, ParsePath("0/foo/2").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "foo"}, {s: "2"}})
}) })
t.Run("eliding empty segments", func(t *testing.T) {
Wish(t, ParsePath("0//2").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "2"}})
})
t.Run("eliding leading slashes", func(t *testing.T) { t.Run("eliding leading slashes", func(t *testing.T) {
Wish(t, ParsePath("/0/2").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "2"}}) Wish(t, ParsePath("/0/2").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "2"}})
}) })
t.Run("eliding trailing", func(t *testing.T) { t.Run("eliding trailing", func(t *testing.T) {
Wish(t, ParsePath("0/2/").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "2"}}) Wish(t, ParsePath("0/2/").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "2"}})
}) })
t.Run("eliding empty segments", func(t *testing.T) { // NOTE: a spec for string encoding might cause this to change in the future!
Wish(t, ParsePath("0//2").segments, ShouldEqual, []PathSegment{{s: "0"}, {s: "2"}})
})
t.Run("escaping segments", func(t *testing.T) { // NOTE: a spec for string encoding might cause this to change in the future!
Wish(t, ParsePath(`0/\//2`).segments, ShouldEqual, []PathSegment{{s: "0"}, {s: `\`}, {s: "2"}})
})
} }
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment