Commits · 99058684ca92619e6980fb0dd4c4f974a564bcdc · ld / go-ld-prime

11 Aug, 2019 5 commits

Partial typed.wrapnodeStruct implementation. · 99058684

Eric Myhre authored Aug 11, 2019

It covers the bases on reading, and has a NodeBuilder which works to
create new nodes based on the type-level structure.

(Quite a few other things about it are incomplete, and it might end
up getting merged this way, because the goal here was primarily to
scope out if any of our abstractions will shatter badly, and we've
now got information on that.)

This seems to indicate we are indeed on the right track, regarding
some of those comments in the codegeneration work in the previous
commit: check out the 'Representation' method.  Yep -- we've got
multiple nodebuilders in our future.
Signed-off-by: Eric Myhre <hash@exultant.us>

99058684

Commit some work-in-progress on typed struct gen. · 0d349cf6

Eric Myhre authored Aug 11, 2019

This has been on the clipboard for a while now.  When I first wrote it,
the idea that we'd need multiple NodeBuilders per typed Node seemed a
little staggering; now that the idea has had a while to cook, it seems
clear and uncontrovertial, so this diff now looks like far less of a
noodle-baker than it originally was.
(I had been planning to switch to working on runtime (that is,
wrapper-based, non-codegen) typed nodes for a while to validate this
idea, and now I have some partial diffs from that to land too, but
the validation returned "true", so now I regret having multiple irons
in the fire... sigh!  Software development is hard.)
Signed-off-by: Eric Myhre <hash@exultant.us>

0d349cf6

Add missing accessors to schema.TypeStruct. · 9f6b4fce
Eric Myhre authored Aug 11, 2019
```
Signed-off-by: Eric Myhre <hash@exultant.us>
```
9f6b4fce
Merge branch 'just-string' · 1c36566f
Eric Myhre authored Aug 11, 2019

1c36566f

Correct a comment about performance in justString. · d0ce3ded

Eric Myhre authored Aug 11, 2019

There's actually a doozy of performance considerations in this commit.
Verifying (and then correcting) that comment kicked off a surprisingly
deep and demanding binge of research.

(Some of the considerations are still only "considerations",
unfortunately -- one key discovery is that (surprise!) conclusive
choices require more info than microbenchmarks alone can yield.)

First, the big picture:

One of the things we need to be really careful about throughout a
system like go-ipld-prime (where we're dealing with large amounts of
serialization) is the cost of garbage collection. Since we're often
inside of an application's "tight loop" or "hot loop" or whatever you
prefer to call it, if we lean on the garbage collector too heavily...
it's very, very likely to show up a system-wide impact.

So, in essence, we want to call "malloc" less. This isn't always easy.
Sometimes it's downright impossible: we're building large tree
structures; it's flatly impossible to do this without allocating
some memory.
In other cases, there are avoidable things: and in particular, one
common undesirable source of allocations comes from "autoboxing" around
interfaces. (More specifically, the name of the enemy here will often
show up on profiling reports as "runtime.convT2I".) Sometimes this one
can be avoided; other times, not.

Now, a more detailed picture:

There are actually several functions in the runtime that relate to
memory allocation and garbage collection, and the less we use any of
them, the better; but also, they are not all created equal.

These are the functions that are of interest:

- runtime.convT2I / runtime.convT2E
- runtime.newObject
- runtime.writeBarrier / runtime.gcWriteBarrier
- runtime.convTstring / etc

Most of these functions call `runtime.mallocgc` internally, which is
why they're worth of note. (writeBarrier/gcWriteBarrier are also
noteworthy, but are different beasts.)

Different kinds of implementations of something like `justString` will
cause the generated assembly to contain calls to various combinations
of these runtime functions when they're converted into a `Node`.

These are the variations considered:

- Variation 1: `type justString string`:
results in `runtime.convTstring`.

- Variation 2: `type justString struct { x string }`:
results in `runtime.convT2I`.

- Variation 3: as above, but returning `&justString{val}`:
results in `runtime.newobject` *and* its friends
`runtime.writeBarrier` and `runtime.gcWriteBarrier`.

The actual performance of these... it depends.

In microbenchmarks, I've found the above examples are roughly:

- Variation 1: 23.9 ns/op 16 B/op 1 allocs/op
- Variation 2: 31.1 ns/op 16 B/op 1 allocs/op
- Variation 3: 23.0 ns/op 16 B/op 1 allocs/op

So, a couple things about this surprised me; and a couple things I'm
still pretty sure are just microbenchmarks being misleading.

First of all: *all* of these call out to `mallocgc` internally. And so
we see an alloc-per-op reported in all three of them. (This actually
kinda bugs me, because I feel like we should be able to fit all the
requisite information in the first case directly into the interface,
which is, if I understand correctly, already always two words; and
arguably, the compiler might be smart enough to do this in the second
case as well. But I'm wrong about this, for whatever reason, so let's
just accept this one and move along.) But they vary in time. Why?

Variation 2 seems to stand out as slower. Interestingly, it turns out
`convT2E` and `convT2I` are extra problematic because they involve a
call of `typedmemmove` internally -- as a comment in the source says,
there's both an allocation, a zeroing, and then a copy here (at least,
as of go1.12); this is a big bummer. In addition, even before getting
that deep, if you check out the disassembly of just our functions:
for our second variation, as inlined into our microbenchmark, there are
9 instructions, plus 1 'CALL'; vs only 3+1 for the first variation.
This memmove and extra instructions seems to be the explainer for why
our second variation (`struct{string}`) is significantly (~8ns) slower.

(And here I thought variation two would do well! A struct with one
field is the same size as the field itself; a string is one word
of pointer; and an interface has another word for type; and that's
our two words, so it should all pack, and on the stack! Alas: no.)

Now back up to Variation 1 (just a typedef of a string): this one
invokes `runtime.convTstring`, and while that does invoke `mallocgc`,
there's a detail about how that's interesting: it does it with an ask
for a small number of bytes. Specifically, it asks for... well,
`unsafe.Sizeof(string)`, so that varies by platform, but it's "tiny".
What's "tiny" mean? `mallocgc` has a specific definition of this, and
you can see it by grepping the runtime package source for
"maxTinySize": it's 16 bytes. Things under this size get special
treatment from a "tiny allocator"; this seems to be why
`runtime.convTstring` is relatively advantaged.

(You can see benchmarks relating to this in the runtime package itself:
try `go test -run=x -bench=Malloc runtime`. There's a *huge* cliff
between MallocLargeStruct versus the rest of its fellows.)

Variation 3 also appears competitive. This one surprises me, and this
is where I still feel like microbenchmarks must be hoodwinking.
The use of `runtime.newobject` seems to hit the same corners as
`runtime.convTstring` at runtime in our situation here: it's "tiny";
that's neat. More confusingly, though, `runtime.writeBarrier` and
`runtime.gcWriteBarrier` *should* be (potentially) very high cost
calls. And for some reason, they're not. This particular workload in
the microbenchmark must just-so-happen to tickle things in such a way
that these calls are free (literally; within noise levels), and I
suspect that's a happy coincidence in the benchmark that won't at all
hold in real usage -- as any amount of real memory contention appears,
the costs of these gc-related calls can be expected to rise.

I did a few more permutations upon Variations 2 and 3, just out of
curiosity and for future reference, adding extra fields to see if any
interesting step functions revel themselves. Here's what I found:

- {str,int,int,int,int} is 48 bytes; &that allocs the same amount; in speed, & is faster; 33ns vs 42ns.
- {str,int,int,int} is 48 bytes; &that allocs the same amount; in speed, & is faster; 32ns vs 42ns.
- {str,int,int} is 32 bytes; &that allocs the same amount; in speed, & is faster; 32ns vs 39ns.
- {str,int} is 32 bytes; &that allocs the same amount; in speed, & is faster; 31ns vs 38ns.
- {str} is 16 bytes; &that allocs the same amount; in speed, & is faster; 24ns vs vs 32ns.

Both rise in time cost as the struct grows, but the non-pointer
variant grows faster, and it experiences a larger step of increase
each time the size changes (which in turn steps because of alignment).
The &{str} case is noticeably faster than the apparently linear
progression that starts upon adding a second field; since we see the
number 16 involved, it seems likely that this is the influence of the
"tiny allocator" in action, and the rest of the values are linear
relative to each other because they're all over the hump where the
tiny allocator special path disengages.

(One last note: there's also a condition about "noscan" which toggles
the "tiny allocator", and I don't fully understand this detail. I'd
have thought strings might count as a pointer, which would cause our
Variation 3 to not pass the `t.kind&kindNoPointers` check; but the
performance cliff observation described in the previous paragraph
seems to empirically say we're not getting kicked out by "noscan".
(Either that or there's some yet-other phenomenon I haven't sussed.))

(Okay, one even-laster note: in the course of diving around in the
runtime malloc code, I found an interesting comment about using memory
"from the P's arena" -- "P" being one of the letters used when talking
about goroutine internals -- and I wonder if that contributes to our
little mystery about how the `gcWriteBarrier` method seems so oddly
low-cost in these microbenchmarks: perhaps per-thread arenas combined
with lack of concurrency in the benchmark combined with quickly- and
sequentially-freed allocations means any gcWriteBarrier is essentially
reduced to nil. However, this is just a theory, and I won't claim to
totally understand the implications of this; commenting on it here
mostly to serve as a pointer to future reading.)

---

Okay. So what comes of all this?

- I have two choices: attempt to proceed further down a rabbithole of
microbenchmarking and assembly-splunking (and next, I think, patching
debug printfs into the compiler and runtime)... or, I can see that last
one as a step too far for today, pull up, commit this, and return to
this subject when there's better, less-toy usecases to test with.
I think the latter is going to be more productive.

- I'm going to use the castable variation here (Variation 1). This
won't always be the correct choice: it only flies here because strings
are immutable anyway, and because it's a generic storage implementation
rather than having any possibility of additional constraints, adjuncts
from the schema system, validators, etc; and therefore, I don't
actually care if it's possible to cast things directly in and out of
this type (since doing so can't break tree immutability, and it can't
break any of those other contracts because there aren't any).

- A dev readme file appears. It discusses what choices we might make
for other cases in the future. It varies by go native kind; and may
be different for codegen'd types vs general storage implementations.

- Someday, I'd like to look at this even further. I have a persistent,
nagging suspicion that it should be possible to make more steps in the
direction of "zero cost abstractions" in this vicinity. However, such
improvements would seem to be pretty deep in the compiler and runtime.
Someday, perhaps; but today... I started this commit in search of a
simple diff to a comment! Time to reel it in.

Whew.
Signed-off-by: Eric Myhre <hash@exultant.us>

d0ce3ded

08 Aug, 2019 5 commits

Add a 'justString' implementation. · 040f47d2

Eric Myhre authored Aug 05, 2019

This is meant to be about the cheapest implementation of Node possible.

It's important to have this given how often we need string nodes.
In the example of the moment: in writing some typed.Node systems, I
need some intermediate string nodes just to nudge around map processes.
Those nodes need to be handy and their implementation must be cheap.

It's also useful that we can have a constructor which doesn't return
errors, so it's easier to chain.  The NodeBuilder features are
restrained to needing to be able to yield errors for interface
satisfaction purposes, in turn needed for more complex features.
But when building a specific node, there's no reason we need to go
through NodeBuilder at all.

I'm putting this in the ipldfree package at the moment.  I wouldn't
mind putting a constructor for this in the root ipld package as well,
but that would either mean a cyclic import or putting this
implementation as a blessed one in the main package.
(Putting this as a blessed default implementation in the main package
is possible, and can be considered later, but is a "not today" thing;
having these implementations in a subpackage is a good forcing-function
for development to make sure we don't accidentally leave any important
things unexported in the main package.)

040f47d2

Merge pull request #25 from ipld/bugs/fix-recursion-and-union-selectors · 5ace6441
Eric Myhre authored Aug 08, 2019
```
Fix interaction between ExploreRecursive and ExploreUnion
```
5ace6441
Merge pull request #20 from ipld/feat/better-traversal · 63048dbf
Eric Myhre authored Aug 08, 2019
```
More complete selector traversal
```
63048dbf
Merge pull request #19 from ipld/feat/selector-builder · 499d204a
Eric Myhre authored Aug 08, 2019
```
Builder for selectors
```
499d204a
Merge pull request #26 from ipld/feat/recursive-selectors · 30cf80e2
Eric Myhre authored Aug 08, 2019
```
Feat/recursive selectors
```
30cf80e2

06 Aug, 2019 11 commits

fix(selectors): fix recursion and union interaction · c734676e

hannahhoward authored Jul 29, 2019

Fixes a bug where ExploreRecursive would not properly replace ExploreRecursiveEdge (and cause a
panic when traversed) if ExploreUnion was part of the recursion tree.

c734676e

feat(traversal): add LastBlock progress info · 76ca1050
hannahhoward authored Jul 26, 2019
```
Properly record last block traversed
```
76ca1050

refactor(traversal): readability refactor · 2864b8df

hannahhoward authored Jul 18, 2019

refactoring functions in traverseInformatively to simply and optimize logic and also minimize inner
loop function calls

2864b8df

feat(traversal): support lists and low cardinality · 31c9bb76

hannahhoward authored Jul 17, 2019

Improve the traverse function so that it only explores relevant fields for low cardinality selectors
and traverses high cardinality selectors for lists correctly

31c9bb76

feat(traversal): traverse links · fcc1cc53
hannahhoward authored Jul 17, 2019
```
Add link loading to traverse informatively so that you can traverse across links
```
fcc1cc53

fix(encoding): fix decode for simple nodes · b5a8551a

hannahhoward authored Jul 17, 2019

Decoding of nodes that are just simple values was erroring for both CBOR & JSON. This fixes the
issue as well as fixing it in the general unmarshal.

b5a8551a

refactor(builder): extract to package · 1dac0811
hannahhoward authored Aug 06, 2019
```
Extract the selector buidler to a seperate package
```
1dac0811

feat(selectors): create a builder · e4e37206

hannahhoward authored Jul 17, 2019

Adds a builder class to quickly assemble selectors nodes, using a DSL that enforces the underlying
schema. This allows quick assembly of selector nodes. This will likely be removed once IPLD Schema
code gen is finished, but for the time being, it provides a quick way to ensure mostly typesafe
selector nodes.

e4e37206

refactor(selector): Move parse context to receiver · cbedbe9a

hannahhoward authored Aug 06, 2019

Rename SelectorContext to ParsedParent (more accurate) and put in a ParseContext state object which
is tracked w/o var args

cbedbe9a

feat(selectors): better recursive selector validation · 8101125e
hannahhoward authored Jul 18, 2019
```
Support validating recursive selectors to match ExploreRecursive with ExploreRecursiveEdge
```
8101125e

feat(selectors): implement recursive selector · a2104b7f

hannahhoward authored Jul 16, 2019

An initial implementation of recursive selectors that offers a way to traverse them. Not covered is
validating ExploreRecursiveEdge being under an ExploreRecursive selector

a2104b7f

05 Aug, 2019 4 commits

Merge branch 'nodebuilder-clarifications' · 631c9dad
Eric Myhre authored Aug 05, 2019

631c9dad

Add description of Node.NodeBuilder <-> ADL relationship. · 06471064

Eric Myhre authored Aug 05, 2019

Mind, I don't know if all the details of this have been worked out.
In particular, where to get the LinkStorer and other fun stuff passed
through this interface might be a fun dance (if, indeed, such a thing
ends up needed -- this hasn't been written yet!).  We'll see.
One thing is clear: we want this contract here for all the same
already-known reasons: a traversal.Transform over a structure using
an ADL should "just work".
Signed-off-by: Eric Myhre <hash@exultant.us>

06471064

Clarify role of Node.NodeBuilder. · 77d79154

Eric Myhre authored Aug 05, 2019

Prompted in part by
https://github.com/ipld/go-ipld-prime/pull/21#issuecomment-518115245 .

Turns out some of the stuff our earliest Node implementations do are
actually pretty unusual.  Many other features have more strictures.
Therefore, it's important to specify the behavior such that the
features that have to operate within more restrictions can still
comply, and it's clear that anything else shouldn't be expected!

(Perhaps we should just... make the ipldfree.Node implementations not
do the flexible things they do right now.  That package is starting
to look in need of some renovation anyway.  We'll see.)
Signed-off-by: Eric Myhre <hash@exultant.us>

77d79154

Merge pull request #21 from ipld/schema-and-codegen-fun · c71d7fb2
Eric Myhre authored Aug 05, 2019
```
Schema and codegen fun
```
c71d7fb2

29 Jul, 2019 1 commit
- Merge pull request #23 from ipld/bugs/cbor-link-traversal · f538ddd4
  Eric Myhre authored Jul 29, 2019
```
fix(encoding): match go-ipld-cbor cid encoding
```
  f538ddd4
27 Jul, 2019 1 commit

fix(encoding): match go-ipld-cbor cid encoding · 015389d5

hannahhoward authored Jul 26, 2019

go-ipld-cbor encodes a 0 byte ahead of cid bytes for multibase support - to maintain compatibility
of nodes encoded to CBOR, add this feature

015389d5

21 Jul, 2019 1 commit

One more larger struct in the codegen test corpus. · fcda4cc7

Eric Myhre authored Jul 21, 2019

This one was particularly aimed at making sure I hit all the hardest
situations as trying to write the generators for NodeBuilders.

It certainly did so! I found enough conceptual/design work to do that
I'm actually going to park this here for a while, probably land this
branch back to master before it gets any longer, and then approach
things from some other angles for a while.

In particular: We're going to need several *different* NodeBuilders
for schema-typed nodes. The astute reader probably already noticed
this, as it's implied by the `typed.Node{}.Representation() Node`
method: that adds a second Node instance, and that means a second
NodeBuilder. What's a little more fun yet is there might be more than
two: for example, if we want the serial mode to support variation in
field order for structs, that's another branch we need to account for
somewhere... and, ordered vs unordered modes might need different
structures of memory to do their job efficiently, so that means wholly
different implementations of the interface. Whee.

So. I'm thinking I might switch tack to knocking around in the
(non-codegen) typed Node package for a while and making sure nothing
looks wildly amiss over there while exploring these same features
Then, when coming back here: looks like we're due for that breakdown
of `EmitNodeBuilder{Detail...}` methods: and perhaps that's
actually going to belong on a new type rather than glommed on flatly
to `typeGenerator`... since there may be a n:1 relationship there!

fcda4cc7

20 Jul, 2019 8 commits

Add generateKindStruct.EmitNodeMethodMapIterator. · 41aed25d