Commits · 46ca29fe25dbb84396829806f8fc32edf314dda8 · ld / go-ld-prime

20 Mar, 2019 1 commit
- fix(cidlink): init multicodec registry · 46ca29fe
  hannahhoward authored Mar 19, 2019
```
allocate multicode tables so register functions don't panic
```
  46ca29fe
18 Mar, 2019 1 commit
- Fix rather important typo in multicodec use. · fc13adf1
  Eric Myhre authored Mar 18, 2019
```
Thanks Hannah!
Signed-off-by: Eric Myhre <hash@exultant.us>
```
  fc13adf1
16 Mar, 2019 13 commits

Merge branch 'drop-mutable-node' · adc089c2
Eric Myhre authored Mar 16, 2019

adc089c2

Add NodeBuilder to Node interface. · 025fcf8a

Eric Myhre authored Mar 16, 2019

(... offically.  Lots of docs have probably already been stating that
this is there.  Now it actually... is.)
Signed-off-by: Eric Myhre <hash@exultant.us>

025fcf8a

Drop MutableNode interface. · 6428f14f

Eric Myhre authored Mar 16, 2019

This has been deprecated and replaced by the NodeBuilder system
for a good while now; time to scrape it into the dustbin completely.

Tests that were primarily on the mutable node system itself also
drop, so, this is a *very* large delete diff.

A few other tests used MutableNode just incidentally, and those are
quick fixed to use NodeBuilder.
Signed-off-by: Eric Myhre <hash@exultant.us>

6428f14f

Merge branch 'linking-redux' · fdbbd90f
Eric Myhre authored Mar 16, 2019

fdbbd90f
Update traversal package to new linking system. · b8550cf5
Eric Myhre authored Mar 16, 2019
```
Signed-off-by: Eric Myhre <hash@exultant.us>
```
b8550cf5

Dag-json marshal and unmarshal. · 10442b3c

Eric Myhre authored Mar 16, 2019

And fixes to all the encoding systems for checking lengths when they're
provided by map and list start tokens.  Inconsistencies there are now
errors.

And some consistency changes across all the encoders to keep the diff
of the dag-json system as minimal as possible.  (Dag-json needs to
refer to the last handful of tokens sometimes when parsing a mapClose,
so we keep their values outside of the loop body now.)
Signed-off-by: Eric Myhre <hash@exultant.us>

10442b3c

Dag-cbor marshal/unmarshal. · a274b2db

Eric Myhre authored Mar 16, 2019

We now have CIDs support!  You can create links backed by cids,
and marshal them with dag-cbor; and you can unmarshal cbor data
with dag-cbor and expect things with the CID link tag to be parsed
into CIDs and exposed as IPLD Links.  Yay!

(Dag-json is lagging.  The parse for those links is... more involved.
When supported, it'll similarly have its own unmarshal and marshal
just like the ones this diff introduces for dag-cbor.)
Signed-off-by: Eric Myhre <hash@exultant.us>

a274b2db

Update Node interfaces to use Link instead of CID. · 694c6f3c

Eric Myhre authored Mar 16, 2019

As detailed in comments a few commits ago, this is part of a big, big
roll towards keeping linking details far enough off to one side that
one can actually use most of the IPLD system without forming an
explicit compile-time dependency on any linking features (until, of
course, one uses the linking features).

This is a surprisingly small diff, because... well, because most of
the *interesting* features around linking simply weren't implemented
yet, and at this point everything that is has already been isolated
in the new cidlink and related encoding packages.
"CID" was *already* just a semantic placeholder that meant "eh, link".
Signed-off-by: Eric Myhre <hash@exultant.us>

694c6f3c

Finish creating cid in cidlink.LinkBuilder. · 89c40af3
Eric Myhre authored Mar 16, 2019
```
Signed-off-by: Eric Myhre <hash@exultant.us>
```
89c40af3

Repose refactored into several packages. · 57832ad7

Eric Myhre authored Mar 16, 2019

All of which now explicitly confess their cid-specificness.

Some things in the 'repose' draft were *really* twisted; I'm glad that
first draft is getting replaced before anything actually used it...

For example, NodeBuilderChooser was just ridiculously misplaced abstraction.
When doing a traversal, you have the local type information (if any) already
in hand, and can just... pick an appropriate NodeBuilder already. Now that
the NodeBuilder is simply a parameter to Link.Load, everything shakes out
much, much more clearly as a result.

The cidlink package contains all concrete referns to Cids. This implements
the ipld.Link and ipld.LinkBuilder interfaces... but if you don't import
the cidlink package in your program, you won't find any of the cid packages
(nor their numerous transitive dependencies) in your dependency set.

Multicodecs are now a registry which is confined in scope to the cidlink
package. (It's global, but I think in practice this will be fine: it's a
plugin system, and there's no good cause for allowing variations in how
those magic bytes of cids are interpreted.)

There are now dagcbor and dagjson packages for encoding. These explicitly
refer to the cidlink package (and register themselves on package init).
While these refer to cidlink, you could imagine we might also introduce
other encoding packages which *don't*.

Finally, note that the dagcbor and dagjson packages are in fact still not
done. This is the same logic/completeness they had before this diff...
which does not include actual parsing of cids! However, it's now clear
where to introduce that and at what scope. (It will probably require
more duplication of unmarshalling code than desirable, but, alas, that
might simply be the cost of doing business. Dagjson in particular has
topologically "interesting" things to handle that I'd be loathe to make
a sufficiently pluggable unmarshal traversal to support; it would be
possible, but likely a noticable slowdown to continuously check and
then promptly disregard the interesting case.)

Some work remains, but this is now pretty close to sanity.
Signed-off-by: Eric Myhre <hash@exultant.us>

57832ad7

Add context params to link load and build. · dbf02a67

Eric Myhre authored Mar 16, 2019

These belong in load and build since those are at the top of the stack;
they also (perhaps surprisingly) aren't necessary as params to the
Loader and Storer function interfaces (since those just return readers
and writers, and thus 'cancel' for those is 'stop using it').
Signed-off-by: Eric Myhre <hash@exultant.us>

dbf02a67

Begin introduction of new linking interfaces. · 6950c5cf

Eric Myhre authored Mar 16, 2019

This is going to be the start of a pretty hectic set of commits.

1. The LinkLoader, LinkBuilder, and LinkContext types currently in
the traversal package are hereby doomed (and will be deleted by
the time this branch is ready to land).

2. Cid itself will disappear from almost all remaining concrete uses.
For example, traversal.TraversalProgress.LastBlock will use Link
instead of Cid. (You'll have to cast back to Cid if that detail
is important to your application!)

3. The 'repose' package is pretty much getting nuked.

Instead: we have these new interfaces. Link will be abstract.

We'll add a linking package with a subpackage containing implementation
with Cids. Our encoding systems will then also live there: this makes
sense since multicodec is definitely a detail associated with Cids.
We'll also have dag-cbor and dag-json specific encodings in subpackages
associated with this whole thing: those will be able to read and reify
Link instances during their handling of serial data. This too is
parsimonious and correct (e.g. dag-json parsing is technically distinct
from regular json parsing, even if it's experientially close).

And if all goes according to plan, we will -- shockingly -- be able to
use almost the entirety of the IPLD system... *without* forming an
explicit import dependency on the Cid packages *until* we directly use
any link loading features based on Cids. That'll be neat.

All these changes will be staged across a series of commits because
the total diff will definitely be something fierce.
Signed-off-by: Eric Myhre <hash@exultant.us>

6950c5cf

Move Path to the root package. · d0edb867

Eric Myhre authored Mar 16, 2019

Yes, this one has moved about a bunch now. Hopefully this is the last time.
It's true and elegant that paths really only emerge as descriptions of traversal
progress; however, this misses a few other practicalities.
There are other kinds of traversal (other than the traversal package, whoa!) out
there: see the typesystem packages, which had grown a custom path implementation
simply for error reporting messages!
In general, we seem to want to have Path around for logging and errors, which
will make it increasingly desirable to have available in the root package when
we begin to clean up towards strongly typed errors.
And we also need Path for LinkContext -- and wherever that comes to rest, it
definitely need to not be an import cycle problem, which it *is* if Path is
in traversal and we wanted LinkContext to be *anywhere* else.
All of these point to moving Path back up to the root, and the errors concern
in particular cinches it.

Drop the Path.traverse method. As has been noted in its comment for a while
now, that method wasn't useful for much, having been replaced by features
in the traversal package.

(Also drop the tests specific to the Path.traverse method. We should write
more tests against the features now implemented by travesral.Focus... but at
this point, it'd be easier to start over. The tests we're dropping are against
a different model of traversal (returns rather than visitors) and are also
built against the old hacky ipldfree mutable model which is deprecated and
soon to be dropped.)

Also, a small docs fix: drop description of this Path implementation as a
"merklepath" -- it is not. These paths are all relative, and do not contain
an innate understanding of hashed object identifiers at their first segment.
Signed-off-by: Eric Myhre <hash@exultant.us>

d0edb867

13 Mar, 2019 5 commits

Merge branch 'docs' · 561e2a49
Eric Myhre authored Mar 13, 2019

561e2a49

doc: drop mention of hypothetical "IPFN". · 2677c902

Eric Myhre authored Mar 13, 2019

It's no good to introduce a term just to spend a halfhearted paragraph
partially describing it just to say that it's out of scope.

The schema page also nowadays already has some mentions of how schema
are designed to avoid turing completeness and are not intended to
encompass dependent typing systems, so this descoping is already noted.
Signed-off-by: Eric Myhre <hash@exultant.us>

2677c902

Merge branch 'integration' · 4671c928
Eric Myhre authored Mar 13, 2019

4671c928

Update deps. · a64d28cd

Eric Myhre authored Mar 13, 2019

A base32 library moved github orgs and name.

One small change to repose (refmt added a param to cbor decoders).

Other updates unimpactful.
Signed-off-by: Eric Myhre <hash@exultant.us>

a64d28cd

doc: typo fix · 7bfa6b34
Eric Myhre authored Mar 13, 2019
```
(thanks rod)
Signed-off-by: Eric Myhre <hash@exultant.us>
```
7bfa6b34

12 Mar, 2019 6 commits

doc: Schema syntax doc, with examples. · b7b52701

Eric Myhre authored Mar 12, 2019

And links pointing out the schema-schema and other examples over in the
type declaration implementation packages, which are by far the most
comprehensive thing that's easy to link at the moment.

Lots of TODOs, and I think I'll probably merge with them remaining:
there's a *lot* to doc here, and while it's good to enumerate the sheer
scope of it all, filling it out is not my highest priority for the day.

On the bright side, the schema-schema *is* pretty comprehensible.
Signed-off-by: Eric Myhre <hash@exultant.us>

b7b52701

doc: advanced layouts. · 5980f79a

Eric Myhre authored Mar 12, 2019

Turns out I couldn't talk myself into dropping the content in the
previous commit without having *some* replacement ready.

There's still some open TODOs and speculation here, but it's marked
as such, and it's better to have it written than not.

Some of these docs harken way back to this gist:
https://gist.github.com/warpfork/6df17e791936d1f9b0d5e5483678c8bf
with only moderate updates to the most recent understandings and nouns.
Signed-off-by: Eric Myhre <hash@exultant.us>

5980f79a

Drop speculations about advLayout and schema. · d4744975

Eric Myhre authored Mar 12, 2019

The content is still roughly accurate, but... well, speculative.

The other parts of this old doc fragment (package layout details) are
now in the newer more complete schema doc file, so, can be dropped.
(Which would leave *only* speculations in this file, so...)
Signed-off-by: Eric Myhre <hash@exultant.us>

d4744975

doc: brush up some schema docs. · 7293707a

Eric Myhre authored Mar 12, 2019

And we'll need a new file for syntax shortly.  Blank as yet.
Signed-off-by: Eric Myhre <hash@exultant.us>

7293707a

doc: links to schema docs from doc readme index. · 1b2f352d
Eric Myhre authored Mar 12, 2019
```
Signed-off-by: Eric Myhre <hash@exultant.us>
```
1b2f352d

doc: finish NodeBuilder docs; more internal links. · 53df5195

Eric Myhre authored Mar 12, 2019

And add godoc links.

And some quick words about the fluent APIs.
Signed-off-by: Eric Myhre <hash@exultant.us>

53df5195

08 Mar, 2019 1 commit
- Changes to make my bridge compile · 06628466
  hannahhoward authored Mar 05, 2019
  
  06628466
04 Mar, 2019 1 commit
- doc: more Node docs. · cd838c48
  Eric Myhre authored Mar 04, 2019
```
Signed-off-by: Eric Myhre <hash@exultant.us>
```
  cd838c48
27 Feb, 2019 2 commits

Beginye another pass on docs. · d168547a

Eric Myhre authored Feb 27, 2019

Starting with a new index, which enumerates lots of things which shall
deserve at least one page or section.

Big picture also includes links out to things which shall require their
own (as yet unwritten) pages.

Old ball-of-mud "dev.md" already being eroded into the new files.
Signed-off-by: Eric Myhre <hash@exultant.us>

d168547a

Merge branch 'traversals'. · aaea73ad

Eric Myhre authored Feb 27, 2019

Incomplete, but progress.

There are still many many todos in the content, but that branch
has simply gotten long enough.  And I need to spend some time
docuemnting (and docs have diverged on master), so it's time to
reel it in.

aaea73ad

21 Feb, 2019 6 commits

Merge branch 'encoding' into traversals · cd841fb1
Eric Myhre authored Feb 21, 2019

cd841fb1

Wiring JSON and CBOR to multicodec in repose! Yas! · 749cf35c

Eric Myhre authored Feb 21, 2019

Finally.

This is "easy" now that we've got all the generic marshalling
implemented over Node and unmarshalling onto NodeBuilder:
we just glue those pieces together with the JSON and CBOR
parsers and emitters of refmt, and we're off.

These methods in the repose package further conform this to
byte-wise reader and writer interfaces so we can squeeze them
into Multicodec{En|De}codeTable... and thence feed them
into ComposeLink{Loader|Builder} and USE them!

And by use them, I mean not just use them, but use them transparently
and without a fuss, even from the very high level Traverse and
Transform methods, which let users manipulate Nodes and NodeBuilders
with *no idea* of all the details of serialization down here.

Which is awesome.

(Okay, there's a few more todo entries to sort out in the link
handling stuff.  Probably several.)

But this should about wrap it up with encoding stuff!  I'm going
to celebrate with a branch merge, and consider the main topic to
be lifted back up to the Traverse stuff and how to integrate
cleanly with link loaders and link builders.
Signed-off-by: Eric Myhre <hash@exultant.us>

749cf35c

New marshal implementation! Generic. Woo! · f150a81b

Eric Myhre authored Feb 21, 2019

We have both generic marshal and unmarshal -- they should work for any
current or future ipld.Node implementation, and for any encoding
mechanism that can be bridged to via refmt tokens.

Tests are also updated to use builders rather than the ancient
"mutable node" nonsense, which removes... I think nearly the last
incident of that stuff; maybe we can remove it entirely soon.

As when we moved the unmarshal code into its generic form, most of
this code already existed and needed minor modification.  Git even
correctly detects it as a rename this time since the diff is so small.
And as when we moved the unmarshal code, now we also remove the
whole PushTokens interface; we've gotten to something better now.

Finally we're getting to the point we can look at wiring these up
together with all the multicodec glue and get link loading wizardry
at full voltage.  Yesss.  Sooon.
Signed-off-by: Eric Myhre <hash@exultant.us>

f150a81b

Remove unused field in fluent.NodeBuilder. · 5f70f2e3
Eric Myhre authored Feb 21, 2019
```
Totally unexported, never touched.  Baleet.
Signed-off-by: Eric Myhre <hash@exultant.us>
```
5f70f2e3

Remove Build from fluent.{Map|List}Builder view. · 0776b93b

Eric Myhre authored Feb 21, 2019

Per comments in this diff as well as discussion in previous commit's
message, there's no need for it to be exported anymore.

In fact, the only possible way it could've been used would've been
invalid -- calling those methods in the closure body would've caused
the list/map builder to invalidate itself, and make the library's
later Build call almost certainly error. So, it's very much better
not to have it at exported at all.

This has brought us to an odd result: we could very nearly be dropping
the fluent.{Map|List}Builder *interfaces* (though of course keeping
their internal error-into-panic implementations) and just continuing
to fit the existing ipld.{Map|List}Builder interfaces (but you'd be
within your rights to ignore all error returns; they'd be redundant).
Would this be helpful, though? I dunno. So let's not.
Signed-off-by: Eric Myhre <hash@exultant.us>

0776b93b

Significant changes to fluent.{List|Map}Builder. · f316102c

Eric Myhre authored Feb 21, 2019

Two changes are combined here: these builders now work in tandem with
closures (and call-chaining style is being backed away from entirely);
and, these new closures also get NodeBuilder values supplied to them.

This correctly prepares for NodeBuilder specializations to be viable
in the future.  Although this need is not "yet" exercised,
typed.NodeBuilder won't be able to keep its semantics properly
without this -- It's similar to the existing comments in Unmarshal code
which mention how we'll require specializations for typed.Node usage
in the future around the recursion sites there.  And while this isn't
exercised yet, getting the API correctly-shaped earlier than later
will save a lot of refactoring fuss.

I have also just plain been wanting a closure style syntax, because it
nests nicely and more does the right things with visual alignment.
So now we have one.  Nice.

The diffs in the test code using this new style of builder should be
a nice example of how this looks compared to the old API syntax.
I think those diffs speak for themselves.

(It absolutely would be even nicer if we had a syntax for declaring
closures in golang that could infer the types of their arguments
without putting such a heavy textual weight on the closure declaration
site.  There's no way to do this in Golang currently, but I'm pretty
certain it would be a feasible language enhancement without much
in the way of theoretic stretches nor significant complier complexity
nor additional user complexity.  This might be something to pursue.)

---

One interesting design point to debate within the closure design here:
should we really try to 'save' the closure from calling Build?
(This code does.)

On a first thought, perhaps not: doing the Build call in the library
code deprives the author of the closure from being able to Recover
errors from the final Build call if desired.
Doubly problematically, stack traces from those Build errors would
contain worrying less meaningful stack traces, since such errors would
be raised from library code rather than the closure.

On the second thought: both those issues seem easily addressed in
in actual usage.  The stack trace will still capture the line of the
`CreateFoo(fn(...` invokation, and that line should be sufficiently
proximate to user (rather than library) code to be meaningful.
(There's approximately no reason I can imagine to be passing any
ListBuildingClosure functions from far away.  If we had ruby syntax,
this stuff would all be a 'do' block at the end of line, and
it would be syntactically unsupported to provide from a distance.)
It's even possible to wrap exactly the `CreateFoo` call in a Recover,
which makes it possible to get every bit as much control as if we
made the Build call the closure's responsibility and wrapped it there.

In light of those 'second' thoughts -- and also because having the
{Map|List}BuildingClosure definitions include an 'ipld.Node' return
type *in addition* to their already irritating length is a not
insigificant exacerbation of usage friction -- I'm opting to keep
the Build call in the fluent library code for now.

---

(This fix comes here and now because writing additional test fixtures
for the next-up marshalling code without these QoL improvements was
just not any fun, and the more critical future issues jumped in front
of my eyes at roughly the same time.)

---

Also worth noting in this diff: the large REVIEW comment in the
MapBuildingClosure docs about having a NodeBuilder for keys.

I haven't added all of the notes I have on the strings-vs-Nodes
questions around map key APIs to tracking in git, and probably should;
it's turned out more fiddly than expected.  This comment here probably
already gives a fair insight into both how and why and how much so.

The NodeBuilder-for-keys param is included in MapBuildingClosure
for now out of an abundance of caution.  But this is indeed worthy
of review and might be elided in the future.
Signed-off-by: Eric Myhre <hash@exultant.us>

f316102c

20 Feb, 2019 2 commits

New unmarshal implementation! Generic. Woo! · be01e1e5

Eric Myhre authored Feb 20, 2019

This unmarshal works for any NodeBuilder implementation, tada!

Old ipldfree.Node-specific unmarshal dropped... as well as that entire
system of interfaces.  They were first-pass stuff, and I think now it's
pretty clear that it was barking up the wrong tree, and we've got better
ideas to go with now instead.  (Also, as is probably obvious from a skim,
the old code flipped pretty clearly into the new code.)

Turns out refmt tokens aren't a very relevant interface in IPLD.
I'm still using them... internally, to wire up the CBOR and JSON
parsers without writing those again.  But the rest of IPLD is more
like a full-on and principled alternative to refmt/obj and all its
reflection code, and that's... pretty great.

Earlier, I had a suspicion that we would want more interfaces for token
handling on each Node implementation directly, and in particular I
suspected we might use those token-based interfaces for doing transcription
features that flip data from one concrete Node implementation into another.
(That's why we had this ipldfree.Node-specialized impl in the first place.)
**This turns out to have been wrong!**  Instead, now that we have the
ipld.NodeBuilder interface standard, that turns out to be much better suited
to solving the same needs, and it can do so:

- without adding tokens to the picture (simpler),

- without requiring tokenization-based interfaces be implemented per
concrete ipld.Node implementation (OH so much simpler),

- and arguably NodeBuilder is even doing it *better* because it doesn't
need to force linearization (and while usually that doesn't matter... one
can perhaps imagine it coming up if we wanted to do a data transcription
in memory into a Node implementation which has an orderings requirement).

So yeah, this is a nice thing to have been wrong about.  Much simpler now.

Old ipldfree.Node-specialized 'PushTokens' is still around.  Not for long,
though; it just hasn't finished being ported to the new properly generalized
style quite yet.

Note, this is not the *whole* story, still, either.  For example, still
expect to have an ipldcbor.Node which someday has a *significantly* different
set of marshal and unmarshal methods -- it may eschew refmt entirely,
and take the (very) different strategy of building a skiplist over raw
byte slices! -- that will exist *in addition* to the generic implementations
we're doing here and now.  More on that soon.

Yeah.  A lot of interfaces to get lined up, here.  Some of them tug in such
different directions that picking the right ones to make it all possible
seems roughly like solving one of the NP-hard satisfiability problems.
(Good thing it's actually with a small enough number of choices that it's
tractable; on the other hand, enumerating those choices isn't fast, and
the 'verifier' function here ain't fast either, and being a "design" thing,
it can only be evaluated on human wetware.  So yeah, an NP problem on a
tractable domain but slow setup and slow verifier.  Sounds about right.)

(uh, I'm going to write a book "Design: It's Hard: The Novel" after this.)

Tests are patched enough to keep working where they are; I think it's
possible that a reshuffle of some of them to be more closely focused on
the marshal code rather than the node implementation packages might be
in order, but I'm going to let that be a future issue.  (Oh, and they
did shine a light on one quick issue about MapBuilder initialization,
which is also now fixed.)
Signed-off-by: Eric Myhre <hash@exultant.us>

be01e1e5

Fixing MapBuilder error exposure. · eafc200a

Eric Myhre authored Feb 20, 2019

This is the first commit going down a long and somewhat dizzying prerequisite tree:

- For graphsync (an out-of-repo consuming project) we need selectors
- For Selectors we need traversal implemented
- For Traversal implementations we need link loaders [‡]
- For link loading we need all deserialization implemented
- (and ideally, link creation is done at the same time, so we don't get surprised by any issues with the duals later)
- and it turns out for deserialization, we now have some incongruities with the earlier draft at MapBuilder...

So we're all the way at bugfixes in the core ipld.MapBuilder API. Nice.

([‡] Some of those jumps are a little strained. In particular, traversal doesn't
*in general* need link loaders, so we might choose a very different implementation
order (probably one that involves me having a lot less headaches)... *except*,
since our overall driver for implementation order choices right now is graphsync,
we particularly need traversals crossing block boundaries since we're
interested in making sure selectors do what graphsync needs. Uuf.)

What's the MapBuilder design issue? Well, not enough error returns, mostly.
We tried to put the fluent call-chaining API in the wrong place.

Why is this suddenly an issue now? Well, it turns out that properly genericising
the deserialization needs to be able to report error states like invalid
repeated map keys promptly.

Wait, didn't we even *have* deserialization code before? Yes, yes we did.
It's just that that draft was specialized to the ipldfree.Node implementation...
and so it doesn't hold up anymore when we need it to work in general traversal.

Okay, so. That's the stack depth.

With all that in mind...

This diff adds more error return points to ipld.MapBuilder, and maintains the
fluent variant more closely matching the old behavior (including the
call-chaining style), and fixes tests that relied on this syntax.

Duplicate keys rejection is also in this commit. I thought about splitting it
into further commits, but it's so small. (We may actually need more work in
the future to enable Amend+(updating)Insert, but that's for later; perhaps
an Upsert method; whatever, I dunno, out of scope for thought at the moment.)

And then we'll carry on, one step at a time, from there. Whew.

---

Sidebar: also dropping MapBuilder.InsertAll(map[Node]Node) from the interface.
I think this could be better implemented as a functional feature that works
over a MapBuilder than being a builtin, and we should prefer a trim MapBuilder.
And might as well drop it now rather than bother fixing it up just to remove later.

---

ipld.ListBuilder also updated to no longer do a call-chaining style API, while
fluent.ListBuilder continues to do so. This is mainly for consistency;
we don't have the same potential for mid-build error conditions for lists
as we do with maps, but ipld.ListBuilder and ipld.MapBuilder should be similar.

---

Aaaaand one more! NodeBuilder.{Create,Append}{Map,List}() have ALL been
updated to also return errors. Previously, the append methods had an error
state if you used them when the NodeBuilder was bound to a predecessor node
of an unmatching type, but they just swallowed them into the builder and
regurgitated them (much) later; we're no longer doing this. Additionally,
it's occurred to me that *typed* builders -- while not so much a thing, yet,
certainly a thing that's coming -- will even potentially error on CreateMap
and CreateList methods, according to their type constraints. So, jump that now.

...

Yeah, basically a whole tangle of misplaced optimism about error paths in
NodeBuilder and its whole set of siblings has been torn through at once here.
Bandaid ripping sound.
Signed-off-by: Eric Myhre <hash@exultant.us>

eafc200a

14 Feb, 2019 2 commits

Introduce repose.NodeBuilderChooser. · dfd17c2e

Eric Myhre authored Feb 14, 2019

Mea cupla for all these absolutely terrible placeholder names.

Previous comment on MulticodecDecoder about it probably currying a
NodeBuilder inside itself was wrong. (Message of the previous commit
explores this in further detail already, so I won't repeat that here.)

We'll be making concrete implementations of MulticodecDecoder and
MulticodecDecoder in this library for at least JSON and CBOR (for
"batteries-included" operation on the most common uses); other
implementations (like git, etc) will of course be possible to register,
but probably won't be included in this repo directly (for the sake of
limiting dependency sprawl). When we get to them, those two core
implementations should be a good example of the probing-for-interfaces
described in the comments here.
Signed-off-by: Eric Myhre <hash@exultant.us>

dfd17c2e

Link loaders and their dual. · a50962ae

Eric Myhre authored Feb 14, 2019

This is a first and incomplete draft, and also with placeholder names.

The whole topic of loading links and writing nodes out to a hashable
linkable form is surprisingly parameterizable.
On the one hand, great, flexibility is good.
On the other hand, this makes it intensely difficult to provide a
simple design that focuses on the essense of user wants and needs
rather than getting bogged down in a torrent of minutiae.

We want to be able to support a choice of abstraction in storage,
for example -- getting and putting of raw bytes -- without forcing
simultaenously engaging with and reimplementing the usage of multihash,
the encoding and multicodec tagging, and so on.
This is easier said than done.

The attempt here is to have some function interfaces which do the
aggregated work -- and these are what we'll actually use in the rest of
the codebase, such as in the TraversalConfig for example -- and have
some functions to build them out of closures that bind together all
the other details of codecs and hashing and so forth.
In theory you could build your own LinkLoader entirely from whole
cloth; in practice, we expect to use ComposeLinkLoader in almost 100%
of all cases.

I think this is on the overall right track, but there's at least a few
goofs in this draft: specifically, at the moment, somehow the pick of
Node impl got kicked to all the way inside the MulticodecDecoder.
This is almost certainly wrong: we *absolutely* want to support users
picking different Node implementations without having to reconfigure
the wiring of marshal and unmarshal! For example: we need to be able
to load things into typed.Node implementations with concrete types
that were produced by codegen; and we don't want to force rewriting
the cbor parser every time a user needs to do this for a new type!
Something further is required here. We'll iterate in coming commits.

The multicodec tables are an attempt to solve that parameter in a
fairly straightforward plugin-supporting way. This is important
because IPLD has support for some interesting forms of serializaton
such as "git" -- and while we want to support this, we also absolutely
do not want to link that as a strict essential dependency of the core
IPLD library. Where exactly this is going should also become clearer
in future commits -- and remember, as described in the previous
paragraph, at least one thing is completely wrong here and needs
a second round of draft to unkink it.

There are some inline comments about deficiencies in the multihash
interfaces that are currently exported. Some improvements upstream
would be nice; for now, I'm simply coding as if streaming use *was*
supported, so everything will be that much easier to fix when we get
the upstream straightened out.

The overall number of parameters in LinkBuilder is still pretty
overwhelming, frankly, but I don't know how to get it down any further.

The saving grace of all this is that when it's put to work in the
traversal package, things like traversal.Transform will simply stash
the metadata for CID and multicodec and multihash from any links
traversed... and then if an update is propagated through, it'll just
pop that metadata back up for the saving of the new "updated" Node.
The whole lifetime of that metadata is on the stack and the user
never should be bothered by it as long as they're doing "updates".
(Users creating *new* objects get the full facefull of choices,
of course. I don't know if anything can be done about that.)

It's possible that the whole
"multicodecType uint64, multihashType uint64, multihashLength int"
suite of arguments to LinkBuilder should be conveyed in a tuple;
it's not really likely that they'll ever vary independently, and in
general whole applications have probably picked one set of values
and will stick with it application-wide. The `cid.Prefix` struct
even matches this already. But it's... not really clear what's
going on in that package, frankly. There seem to be several competing
variations on builder patterns and I've got no idea which one we're
"supposed" to be using. Therefore, I'm not using cid.Prefix for
this purpose until some more conversations are had about that.

Also note ActualStorer (placeholder name) and StoreCommitter
as a pairing of functions. Previous generations of library have
conjoined this, based on assumption that we'll have "blocks", and
those are small enough that it's "fine" to have them completely
in memory, and do a hashing pass on them before beginning to write.
This is nonsense and the buck stops here. A two-phase operation
with commit to a hash at the *end* is a strictly more powerful
model -- it's easy to implement buffering and all-at-once write
when you have a two-phase interface; it's impossible to implement
a useful two-phase interface when you're forced to start from
an overbuffered single-step interface -- and lends itself better
to efficiency and a lower high-water-mark for memory usage.
Making this choice here in the IPLD libraries might make it
moderately harder to connect this to existing IPFS blockstore code...
but it'll also make it easier to write other correct storage and
loaders (think: opening a single local file with a temp name, and
atomically moving it to a correct CAS location in the commit call),
as well as eventually be on the right path when we start fixing other
IPFS library components to be more streaming friendly.

tl;dr design work. It's hard.
Signed-off-by: Eric Myhre <hash@exultant.us>

a50962ae