Commits · 80febfe2b98092f3d4ac93a7b3519f0505582dc9 · ld / go-ld-prime

27 Feb, 2020 11 commits

Add sanity check on visit count to traversal bench · 80febfe2
Eric Myhre authored Feb 27, 2020

80febfe2
Merge branch 'cycle-breaking-and-traversal-benchmarks' · 61ebf6e2
Eric Myhre authored Feb 27, 2020

61ebf6e2

Finally I can add this: traversal benchmarks. · bed1eff1

Eric Myhre authored Feb 27, 2020

Lots of import cycle fixing was necessary to get here, which was an
unpleasant surprise at the start of the day... but lots of things are
better off now, so it was highly worth it.

bed1eff1

Ah yes, fix TypedLinkNode, the actual cycle cause. · 9d87c836
Eric Myhre authored Feb 27, 2020
```
This is the last part of the cleanup and cyclic import fix started
in the previous commit.
```
9d87c836

Yank TypedNode interface into schema package. · a2ea4706

Eric Myhre authored Feb 27, 2020

Previously it was in the 'impl/typed' package, next to the
runtime-wrapper implementation of the interface.  This was strange.

Not only should those two things be separated just on principle,
this was also causing more import cycle problems down the road:
for example, the traversal package needs to consider the *interface*
for a schema-typed node in order to gracefully handle some features...
and if this also brings in a *concrete* dependency on the
runtime-wrapper implementation of typed nodes, not only is that
incorrect bloat, it becomes a show stopper because (currently, at
least) that implementation also in turn transitively imports the
ipldfree package for some of its scalars.  Ouchouch.

So.  Now the interface lives over in the 'schema' package, with all
the other interfaces for that feature set.  Where it probably always
should have been.

('typed.Maybe' also became known as 'schema.Maybe', which... does not
roll off the tongue as nicely.  But this is a minor concern and we
might reconsider the naming and appearance of that thing later anyway.)

a2ea4706

traversal.NodeBuilderChooser can now return error. Also, the default no... · 50dfd994

Eric Myhre authored Feb 27, 2020

traversal.NodeBuilderChooser can now return error.  Also, the default no longer returns ipldfree.NodeBuilder.

The traversal package having a dependency on the ipldfree Node
implementation package was problematic: it's important that we be able
to benchmark traversal *in combination* with different implementations
of Node, and it turns out if we want to build reusable test and
benchmark functions for that, we get cyclic dependencies when then
trying to use them on our most common implementation!  Ouch.

Having a default implementation of Node referenced was originally out
of a desire for convenience.  But it's logically dubious anyway.
Even the convenience gain in practice is questionable since one still
always needs to configure a LinkLoader in the same areas.  At any rate,
hopefully the error message is helpful enough to make this low-impact.

(We could probably benefit from more helper methods around this, too.
But we can review that later.  Preferably after the 'NodeStyle' feature
from the R&D branches lands, also, which should make this whole area
better named, clearer about where its allocations arrive, and just
less fidgety in general.)

50dfd994

Start new 'corpus' package for tests. · c0cd2769

Eric Myhre authored Feb 27, 2020

Getting *enough* and sufficiently *organized* corpuses becomes a
legitimate challenge.

The docs outline some of the directions this will go while describing
the naming convention.

This naming convention has already been cropping up in an ad-hoc way in
recent commits; this is a step towards documenting it consistently.

There aren't many entries yet; expect it to grow.

Using JSON as the defacto format is a little aggressive, perhaps,
because it makes sort of a wide dependency span.
But since we've already long had unmarshalling of json working,
it seems viable in practice.
And it means we get the marshalling output target corpuses for free
for at least one of our formats.
And it means we can readily make comparisons to stdlib json,
which is nice for having baselines to frame comparisons against.
It also has the interesting sideeffect of making these corpuses
immune to change in the face of refactors to NodeBuilder (which
will be an absurd concern at any time except... right now).
(We'll see.  Maybe I'll regret this after some time passes.  But if so,
this content probably just pivots to being still useful in json
marshal and unmarshal tests.)

I'd like to put this to work in writing more traversal benchmarks...
but that's going to have to wait a few commits, because I've found some
import cycles that get very problematic when I try to proceed there,
and it looks like they might take a few steps to sort out.

c0cd2769

Merge branch 'assembler-upgrade-to-codecs' · d12fcaa4
Eric Myhre authored Feb 27, 2020

d12fcaa4
Merge pull request #47 from ipld/path-clarifications · 4612357c
Eric Myhre authored Feb 27, 2020
```
Path clarifications
```
4612357c

Expand the remark on slashes in paths. · 8093be74

Eric Myhre authored Feb 27, 2020

Previous commit with suggestions from reviewer made me look at this
line again and realize it was *too* specific.

8093be74

Add explicit documentation re null bytes to Path. · 1e3e2ff4

Eric Myhre authored Feb 27, 2020

There's already explicit sections for empty string and slash cases,
on the basis of how likely those are to provoke questions
(even though the content is essentially to highlight how *not*
exceptional they are); it makes just as much sense to do the same
call out for null bytes.

Original text started in
https://github.com/ipld/go-ipld-prime/pull/47/files#r384127483 .
Co-Authored-By: Peter Rabbitson <ribasushi@leporine.io>

1e3e2ff4

24 Feb, 2020 4 commits

Addntl comments on PathSegment equality. · a48c9db9
Eric Myhre authored Feb 23, 2020

a48c9db9

Fix for empty string PathSegment. · 98fdaf17

Eric Myhre authored Feb 23, 2020

Yep, it makes a lot of creation by struct uglier, as you can see in the
diff to the test file.  Fortunately?  Nobody outside the package is
allowed to do that anyway, so, we can pretty much ignore the ergonomic.
(It'll still look uglier if someone's looking at stuff with spew or
go-cmp or some other library like that which peeks naughtily into
unexported fields, but that's hard to do much about.)

We could also address this by adding a discriminant field to the
PathSegment struct.  (If we wanted to use uint internally, we'd
probably have to do that, in fact.  Either that or start using
some other alarmingly magical number like UINT_MAX.  Ygh.)  I didn't.
I don't think we handle lists larger than 2 billion elements
particularly well anyway... so, adding more words of memory to
PathSegment to support that case seems like an entirely losing trade.

98fdaf17

Quick null marshall benchmark. · 3cdd0cbb

Eric Myhre authored Feb 24, 2020

This one takes 489ns/op... indicating that indeed 54.6% of the time
spent by the real marshalling of json is on the json details.

I would've thought this would be even larger, actually.  Hm.

May keep this 'null' marshaller.  Not sure.  Does it tell a useful
story, or tell a story clearer than doing the same thing with json?
Signed-off-by: Eric Myhre <hash@exultant.us>

3cdd0cbb

Port codecs; add benchmarks. · d3ddbfee

Eric Myhre authored Feb 24, 2020

Porting codecs to the new NodeAssembler interfaces was straightforward.

The new codecs exist in the nodesolution "research" dirs for now,
coexisting with the soon-to-be-legacy encoding package.
This means we can see benchmarks of both the old and new designs within
this commit.  (We'll probably give up on this shortly -- when dealing
with the traversal package too, it's gonna stop being reasonable -- but
for now it's still possible and provides interesting information.)

And how *is* that performance, you ask?

Peachy.

Ballpark answers for marshalling:

- 1079ns/op for the new Node
- 1435ns/op for the old Node
- 1559ns/op for stdlib json marshal of a native map.

144% better than the operations of stdlib json is pretty acceptable.
(Will more intense codegen beat that?  Oh for sure.  But this is
*without any codegen*, so this is quite satisfactory.)

Note that much of that time left is probably dominated by
serialization-related allocations rather than the node traversal.
I didn't dive into the pprofs to verify that yet, though.
This picture of the overall act of marshalling is nice to have
since it's a practical end-to-end user story.

This test is also on a very small piece of data, and I expect
the improvements will be further much bigger on larger or
deeper-recursing structures.

And lest this be skimmed over: the excellence of doing better than
stdlib's json **while having pluginable codecs** cannot be understated.

Pretty happy with this.

How's unmarshal?  Eh.  About the same as before.  Remember, we chose
*not* to do a lot of amortizations in the new 'basicnode'
implementations, because although we *could* (and it's quite clear how
to do so), the increase in memory size we'd face since go doesn't allow
unions was deemed too large of a constant factor multiplier.
We *will* see these improvements in codegen, and we can also make
variants of 'basicnode' that do these amortizations in the future.

Doing a lot of thinking about how benchmarks and tests will be managed
as they continue to grow in count and in variation of semantic targets.
Might have to write some tooling around it.  We'll see.

d3ddbfee

23 Feb, 2020 9 commits

Merge branch 'research-admissions' · 530ccd68

Eric Myhre authored Feb 23, 2020

The '_rsrch/nodesolution' package is almost ready to replace the
current core go-ipld-prime interfaces,
and the '_rsrch/nodesolution/node/basic' corresondingly replace
the current 'impl/free' package.

530ccd68

basicnode's constructors shall all prefix 'New'. · 29c1b98e

Eric Myhre authored Feb 23, 2020

This first crossed my mind when considering whether it might be a good
idea to export the concrete implementation types of the 'basicnode'
package (see previous couple of commits -- it mostly had to do with the
question of whether we could reuse more code and shrink the size of
packages emitted by codegen)... but even though that's resolved to a
pretty solid "nope", it still makes sense to have a consistent name
prefix pattern here... and we'll probably encourage and expect to see
similar conventions in other packages (including in codegen, if we
add a handy-constructor-funcs feature to that).

29c1b98e

doc typo fix · 139d1eb6
Eric Myhre authored Feb 23, 2020

139d1eb6

gendemo now uses the standard test mixins. · b37142b6

Eric Myhre authored Feb 23, 2020

And with this, the gendemo package both compiles and passes tests
once again.  yay!  Onward!

b37142b6

gendemo now includes own scalars, & design doc. · bddec04c

Eric Myhre authored Feb 23, 2020

Check out the 'HACKME_scalars.md' file for why this commit represents
a bunch of nontrivial decisions. There are a lot of possible ways
the compile failures here could be fixed, but some of them would have
bigger consequences of longterm weirdness than others.

Or at least, it certainly looks that way from here. Maybe we'll see.

The scalar implementations are almost exact replicates of the basicnode
implementations. Oddly, they haven't been exported yet. This may
change (again, see discussion in the HACKME_scalars document) -- in
which case the code will also change to use wrapper structs.
Note one other divergence: the error messages have different content
for the typename: we mention the package, since this will probably be
typical and useful for codegen-created types in general.

The gendemo package almost compiles again. The rest of the fixes
aren't related to this topic, so come in the next commit.

bddec04c

Document an already true fact about PathSegment's zero value. · 2d590abe

Eric Myhre authored Feb 23, 2020

I... honestly hadn't even realized this before; I just noticed it
when making a refactor that would produce this end state, and thought
"oh, that's silly".  Evidently... it's fine, since this has already
been the case, and not been problematic!  Alright then.
Apparently it's not too silly after all.

2d590abe

Spelling fix. · 0f7de4a9
Eric Myhre authored Feb 23, 2020

0f7de4a9

Further clarification about string encoding. · f163fee9

Eric Myhre authored Feb 23, 2020

It's not very often we care about these things in Go, and when we do,
this stance of "string is really just bytes; guard it yourself" is
the community norm... so, these docs should *almost* go without saying.
However, since IPLD is all about cross-language consistency, and
we may have people from other programming language ecosystems reading
our docs, it seems good to be as clear and explicit as possible.

f163fee9

Several significant clarifications to Path docs. · 2e1845b3

Eric Myhre authored Feb 23, 2020

Primarily, being commital and direct about how special values, erm...
*aren't* -- namely, "/" is a valid segment, and so is empty string --
and confessing how much tricky work that makes.

Several methods are now more upfront about how much they *do not do
good things* on such values.

Added some new constructors for Path which *do* work for all possible
values.  (You could've handled such tricky values with a series of
Join calls, previously, but that would be ergonomically grueling.)

These facts are all defacto 'true' to the best of my knowledge now.
However, the IPLD specs repo is a bit on the quiet side about them
at present (precisely because it's such an irritating little nest of
edge cases and fun stuff)... and so the comments are also littered
with "this may change" warnings.  Still, it's better to be accurate
about what the code does and does not do in its current state.

I'd *like* to formally specify an escaping system and canonical
string encoding for paths.  That should start in the IPLD Specs
repo, though, and involve fixtures, so I won't start it here now.

Thanks to @ribasushi for the kick in the shins that these docs needed
work and clarifications.

2e1845b3

20 Feb, 2020 1 commit

Mega reorg; split prototype packages; much closer to ready to merge coreward as a result. · 9523d918

Eric Myhre authored Feb 20, 2020

Turns out there was a lot of tech debt already accumulating very very
rapidly due to the earlier whimiscal choice to lump the new basicnode
implementations and the demo-codegen implementations into the same
packages as benchmarking and research on this branch began.

The basicnode package is now extracted cleanly, and exists in the
'node/basic' path.  This 'basicnode' package is the equivalent of what
was formerly called 'ipldfree' on the master branch.

The demo codegen content is now under 'node/gendemo'.
Note that it is BROKEN in this commit; there was too much there to
clean up and address in one commit, and this one has already become
quite large and includes a bewildering number of moves as well as
finer content rearrangements.  Fixes will come in next commit(s).

(Why?  Well, the demo codegen content leaned on the basicnode types
for string and int.  Turns out this actually opens *quite* the
can of worms when you try to rectify it.
At the surface level, the basicnode package's types aren't exported.
Could we export them?  Well, sure.  Except that's only half the trick.
Can we export the *assemblers* for those types?  Well, sure.  But...
Can we export the 'w' fields for those assemblers for those types?
**NO** -- this is where it all breaks down -- if those fields are
exported, we lose all control over all immutability everywhere.
Shoot...?  Very shoot.
One clearly correct solution to this is to generate new types for
all of the primitive scalars in every codegen output package;
the downside, obviously, is that's a bit verbose in resulting code.
There might be another very alexandrian solution to this gordian knot:
exporting the basicnode package's types after all, but not their
assemblers; then leaning further into their implementation detail
of being implemented as typedefs rather than strictly enclosed structs,
and having every part of codegen types that handle primitive leafs
directly perform and manage the relevant casts.  The bigger picture
wisdom of this has not been completely analyzed yet.)

The 'impls' path has in general become 'node'.
This will stay this way when we finally merge over core on master;
all the current 'impl' (without the s; sigh) paths will become 'node'.

Numerous new HACKME files have appeared; some are splits of old ones
to account for the new package splits; some are hoisted and polished
from what were previously commentstorms in the code; some are
completely new.  There are also a few new package-scope godocs.

The benchmarks and tests are rearranged and much (much) improved.
Naming is more consistent; and things are consistently extracted
to the 'node/mixins/tests' package, and imported and used from there.
(If you want all the same benchmark info as before, you'll now have
to run benchmarks from several different packages.)

9523d918

12 Feb, 2020 3 commits

Update a few details of node builder rules docs. · 1e7fed33
Eric Myhre authored Feb 12, 2020

1e7fed33

comment cleanup: drop more speculations. · 86e60fa7

Eric Myhre authored Feb 12, 2020

The last line is still particularly interesting and nonobvious, though.
It's good that the current design does *not* expose memory anywhere
during mid-construction, even to parts that are semi-finalized:
if we did, and it so happened that a backing array had to grow, we'd
end up releasing pointers to several backing arrays into the wild,
which would probably end up rather terrible and rather hard to see.

86e60fa7

comment cleanup: this one sailed already. · 830908e2

Eric Myhre authored Feb 12, 2020

It's become clear that `map[string]struct{k, v Node}` is the correct
choice in a wide variety of ways; we've already gone all in on that.

The rest of the comment is speculative, and not wrong, but it'll be
just as well to reboot those speculations later, and many of the things
that were in flux when that comment was penned aren't anymore, so a
revisit should be simpler.

830908e2

10 Feb, 2020 3 commits

Address varying ValueStyle for structs. · dfd6c013

Eric Myhre authored Feb 10, 2020

This is a rather important detail to remember, and one that is easy to
lose track of if not actively wrangling the schema typed usecases.
(And so, I almost did.  Uufdah.)

The various methods for getting NodeStyle out of assemblers are now
much better documented.

As my current goals are still revolving around finishing the new
'ipldfree'/'basic' nodes, wrapping up this whole research branch,
and bringing these interface back to land in the library core...
I'm still not bothering to substantially handle the inside of the
ValueStyle methods on the example struct types.  But it should be
clear enough how to implement them correctly now.

dfd6c013

Clarify how nodestyles are reported, particularly for data inside recursive kinds. · cc8eec6c

Eric Myhre authored Feb 10, 2020

Previous description -- which stated the style of a node may vary based
on how it was created -- was outright wrong, and specified things both
difficult to implement and fairly useless.

There are still some less than completely written parts of the story
for generic transformations, but this definition is at least closer.

cc8eec6c

Finish the new basic implementation of 'any'. · b0dfbd23

Eric Myhre authored Feb 10, 2020

What it does is simple: an 'anyBuilder' will either delegate to a map
builder, or a list builder, or do a scalar build (directly, because
there's not enough to do there to justify using any indirection), *OR*
even simply hang onto and pass through another Node.

The tests ensure that it can be used transparently to pass all the
string tests (yay) as well as all the map tests (yayy!) including those
that exercise recursion (yayyy!!). In addition to demonstrating that
the 'any' code works correctly, it's also a nice demostration of
reusable tests and behavioral specifications paying off. (There's more
work to do here to clean them up, standardize naming, etc, but it's
a nice start.)

The 'anyInhabitedBy{Kind}' stuff stamped out at the bottom is... well,
committing it for posterity, but in fact I'm going to remove that in
the very next commit, as well as some docs on the subject from a few
commits ago that are now showing as incorrect: the idea of the style
of a node varying based on how it was produced is really... *cough*
I'm not sure how I got that in my head; it clearly doesn't fly right.

b0dfbd23

06 Feb, 2020 9 commits

Maps and lists now return their style and key and value styles correctly. · 975d6c4b
Eric Myhre authored Feb 06, 2020

975d6c4b
Finish connecting map and list to rest of scalars. · 32be50ab
Eric Myhre authored Feb 06, 2020
```
Also map able to create child list.
```
32be50ab

Grind out the basic unit values (null and undef). · 9aeff0a4

Eric Myhre authored Feb 06, 2020

Almost the same as the old ones.  Builders and Styles change, ofc.

Also took the liberty of adding TypeName values to all ErrWrongKind.
It's apparently now the convention that untyped things set that with
the lowercase kind name.

9aeff0a4

List implementation. · 14083ea0

Eric Myhre authored Feb 06, 2020

This... just about completes the pantheon of basic stuff in ipldfree
that's getting a reimplementation.

Maps need to be updated to also support recursing into a list, still.

Tests needed badly. Next couple of commits will start handling that.
Need to reorganize some of the existing ones as well. This is clearly
going to be one of those things that needs a large corpus... and for
bonus fun, we need to make a corpus that's ready to go for both working
with schemas and without. Fun fun fun.

14083ea0

Rather critically, recursives need to reach into their child assembler and invalidate them. · a52af3ba

Eric Myhre authored Feb 06, 2020

For all the paths around scalar children, invalidating the child
assembler's pointer back to parent has been sufficient to obstruct any
problematic mutations, but with how directly the operations on map
children are passed through to their 'w' target, we also need to do
an invalidation that affects there.

a52af3ba

typo fixes · 8929c2f1
Eric Myhre authored Feb 06, 2020

8929c2f1
Additional basic tests around strings. · e5b0b8a6
Eric Myhre authored Feb 06, 2020

e5b0b8a6

Check for sideffects from invalid manipulations. · 37dd2d99

Eric Myhre authored Feb 06, 2020

Interestingly, the codegen'd map ends up doing this in a different way
than the generic map: since it *is* actually wrapping and delegating to
another assembler to handle the child value, it has to make sure *that*
is invalidated; otherwise the fact we hand calls to the child assembler
right away before doing our 'flush' would be problematic.

Splitting the 'flush' in half would also work, of course. At present
it seems like six of one and half a dozen of the other. We can subject
this to microbenchmarking later if it seems relevant.

37dd2d99

Remarks on effects and visibility and wip states. · 22687e04

Eric Myhre authored Feb 06, 2020

And tests for them and when they should fail.

Which provokes the inclusion of a few fixes, to make things fail where
they dang well should.

Tests, yo.  They're definitely important.  *Especially* for when things
get *off* the happy path.

22687e04