Deeper prototyping of 'racker' solution, and docs.

I'm still scratching out prototypes in another directory to make it easier to focus on the details I'm interested in rather than get wrapped up with the kerfuffling details of the existing full interfaces... as well as easier to try out different interfaces. And at some point, of course, we want *codegen* for these things, while what I'm plunking about with here is a sketch of the expected *output*. So, this is a large step removed from the final code... but it's easier to sketch this way and "imagine" where the templating can later appear. This solution approach seems to be going largely well. And we have some tests which count allocs very carefully to confirm the desired result! Still a lot to go. There's large hunks of comments and unresolved details still in this commit; I just need a checkpoint. Some things around creation of maps are still looking tricky to support optimizations for. Research needed there. A comment hunk describes the questions, but so far there's no tradeoff-free answer. Perhaps most usefully: here's also a checkpoint of the "HACKME_memorylayout" document! It also still has a few todos, but it's the most comprehensive block of prose I've got in one place so far. Hopefully it will be useful reading for anyone else wanting to get up to speed on the in-depth in "why" -- it's awfully, awfully hard to infer this kind of thing from the eschatology of just observing where pointers are in a finished product! Signed-off-by: Eric Myhre <hash@exultant.us>

Deeper prototyping of 'racker' solution, and docs.
I'm still scratching out prototypes in another directory to make it easier to focus on the details I'm interested in rather than get wrapped up with the kerfuffling details of the existing full interfaces... as well as easier to try out different interfaces. And at some point, of course, we want *codegen* for these things, while what I'm plunking about with here is a sketch of the expected *output*. So, this is a large step removed from the final code... but it's easier to sketch this way and "imagine" where the templating can later appear. This solution approach seems to be going largely well. And we have some tests which count allocs very carefully to confirm the desired result! Still a lot to go. There's large hunks of comments and unresolved details still in this commit; I just need a checkpoint. Some things around creation of maps are still looking tricky to support optimizations for. Research needed there. A comment hunk describes the questions, but so far there's no tradeoff-free answer. Perhaps most usefully: here's also a checkpoint of the "HACKME_memorylayout" document! It also still has a few todos, but it's the most comprehensive block of prose I've got in one place so far. Hopefully it will be useful reading for anyone else wanting to get up to speed on the in-depth in "why" -- it's awfully, awfully hard to infer this kind of thing from the eschatology of just observing where pointers are in a finished product! Signed-off-by: Eric Myhre <hash@exultant.us>
02b9b4a9 · Eric Myhre · 38aa8b7c · 02b9b4a9 · 02b9b4a9 · 02b9b4a9
Commit 02b9b4a9 authored Dec 07, 2019 by Eric Myhre
4 changed files
--- a/_rsrch/microbench/multihoisting/soln/solnFunctions.go
+++ b/_rsrch/microbench/multihoisting/soln/solnFunctions.go
+package solution
+
+type Node interface {
+	LookupString(key string) Node
+}
+
+type NodeBuilder interface {
+	InsertByString(key string, value Node)
+	Build() Node
+}
+
+func (n *Stroct) LookupString(key string) Node {
+	switch key {
+	case "foo":
+		return &n.foo
+	case "bar":
+		return &n.bar
+	default:
+		panic("no")
+	}
+}
+
+func (n *Strooct) LookupString(key string) Node {
+	panic("nyi")
+}
+
+func (n *String) LookupString(key string) Node {
+	panic("nyi")
+}
+
+func NewStroctBuilder() NodeBuilder {
+	r := _Stroct__Racker{}
+	return &r._Stroct__Builder
+}
+
+func (nb *_Stroct__Builder) InsertByString(key string, value Node) {
+	switch key {
+	case "foo":
+		if nb.isset_foo {
+			panic("cannot set field repeatedly!") // surprisingly, could make this optional without breaking memviz semantics.
+		}
+		if &nb.d.foo != value.(*Strooct) { // shortcut: if shmem, no need to memcopy self to self!
+			nb.d.foo = *value.(*Strooct) // REVIEW: maybe this isn't necessary here and only should be checked in the racker?
+		}
+		// interestingly, no need to set br.frz_foo... nb.b isn't revealed yet, so, while
+		nb.isset_foo = true
+	case "bar":
+		nb.d.bar = *value.(*String)
+		nb.isset_bar = true
+	default:
+		panic("no")
+	}
+}
+func (br *_Stroct__Racker) InsertByString(key string, value Node) {
+	switch key {
+	case "foo":
+		if &br.d.foo == value.(*Strooct) { // not just a shortcut: if shmem, this branch insert is ONLY possible WHEN the field is frozen.
+			br.isset_foo = true // FIXME straighten this out to dupe less plz?!  also should panic if already set, consistently (or not).
+		}
+		if br.frz_foo {
+			panic("cannot set field which has been frozen due to shared immutable memory!")
+		}
+		// TODO finish and sanity check in morning.
+		// ... seems like there's a lot less shared code than i thought.
+		// ... resultingly, i'm seriously questioning if builders and rackers deserve separate types.
+		//   ... what's that for? reducing the (fairly temporary) size of builders if used sans racking style?  does that... matter?
+		// no, by the morning light i think this can in fact just be simplified a lot and code share will emerge.
+	default:
+		panic("no")
+	}
+}
+
+func (nb *_Stroct__Builder) Build() Node {
+	return nb.d
+}
+func (br *_Stroct__Racker) Build() Node {
+	// TODO freeze something.  but what?  do i really need a pointer "up" that to be fed into here?
+	// is that even a pointer?  if it depends on what this value is contained in... remember, for maps and lists, it's different than structs: bools ain't it.
+	// if we try to use a function pointer here: A) you're kidding.. that will not be fast.. B) check to see if it causes fucking allocs, because it fucking might, something about grabbing methods causing closures.
+
+	// AH HA.  Obrigado!:
+	//  We can actually use the invalidation of the 'd' pointer as our signal, and do the freezing update the next time either Set or GetBuilderForValue is called.
+
+	// For structs, since we'll keep a builder/racker per field, that's enough state already.  You'll be able to get the childbuilders for multiple fields simultaneously (though you shouldn't rely on it, in general).
+	// For lists, there will be a frozen offset and an isset offset.  The former will be able to advance well beyond the latter.  You'll only be allowed to have one childbuilder at a time.
+	// For maps, we'll still do copies and allocs (reshuffling maps with large embeds is its own cost center usually better avoided).  You'll only be able to use one keybuilder at a time (keys will be copied when done), and one value builder (but each use of the value builder will incur an alloc for its 'd' field).
+	//  And here we hit one more piece of Fun that might require a small break to current interfaces: in order to make it possible for keys to reuse a swap space and not get shifted to heap... we can't return them.
+	//   Which means no ChildBuilderForKey method at all: that interface doesn't fly.
+	//    Oh dear.  This is unpleasant.  More thought needed on this.
+	//     ... Hang on.  Have I imagined this more complicated than it is?
+	//      For structs, the key is always a string.
+	//      For maps with enum keys... always a string.  (...I don't know how to say enums of int repr would work; I don't think they do, Because Data Model rules.)
+	//      For maps with union keys... okay, we're gonna require that those act like struct keys (unions used in keys will have to have nonptr internals; a fun detail).
+	//      For maps with struct keys... okay, sure it has to be a string from repr land... but also yes, we have to accept the mappy form, for typed level copies and transforms.
+	//       Yup.  That last bit means a need for swap space while assembling it.  Which we don't want to be forced to return as a node, because that would cause a heap alloc.  Which we'd immediately undo since map keys need to be by value to behave correctly.  Ow.
+	//        I... don't know what to do about this.  We could have the Build method flat out return a nil, and just "document" that.  But it's pretty unappealing to make such an edge case.
+
+	// Also, yeah, we really do need an InsertByString method on MapBuilder.  Incurring some nonsense boxing for string keys in structs is laughable.
+	//  If you're thinking a workaroudn such as having a single swap space for building a single justString key for temporary use would help... no, sadly: "single swap space" plus visibility model won't jive like that.
+	//   (And even if it did, ever asking an end user to write that much boilerplate is still pretty crass... as well as easily avoidable for minimal library code size cost.)
+	return br.d
+}
--- a/_rsrch/microbench/multihoisting/soln/solnStructures.go
+++ b/_rsrch/microbench/multihoisting/soln/solnStructures.go
+package solution
+
+// -- ipld schema -->
+/*
+	type Stroct struct {
+		foo Strooct
+		bar String
+	}
+
+	type Strooct struct {
+		zot  String
+		zam  String
+		zems Strems
+		zigs Zigs
+		zee  Zahn
+	}
+
+	type Strems [String]
+
+	type Zigs {String:Zahn}
+
+	type Zahn struct {
+		bahn String
+	}
+*/
+
+// -- the readable types -->
+
+type Stroct struct {
+	foo Strooct
+	bar String
+}
+type Strooct struct {
+	zot  String
+	zam  String
+	zems Strems
+	zigs Zigs
+	zee  Zahn
+}
+type String struct {
+	x string
+}
+type Strems struct {
+	x []String
+}
+type Zigs struct {
+	x map[String]Zahn
+}
+type Zahn struct {
+	bahn String
+}
+
+// -- the builders alone -->
+
+type _Stroct__Builder struct {
+	d *Stroct // this pointer aims into the thing we're building (it's as yet unrevealed).  it will be nil'd when we reveal it.
+
+	isset_foo bool
+	isset_bar bool
+}
+type _Strooct__Builder struct {
+	d *Strooct
+
+	isset_zot  bool
+	isset_zam  bool
+	isset_zems bool
+	isset_zigs bool
+	isset_zee  bool
+}
+type _Strems__Builder struct {
+	d *Strems
+	// TODO
+}
+type _String__Builder struct {
+	// okay, this one is a gimme: data contains only a ptr itself, effectively.
+	// TODO: still might need a 'd' pointer, in case you're assigning into a list and wanna save boxing allocs?
+	//  ... so long as we're doing wrapper types (to block blind casting), we're gonna have those boxing alloc concerns.
+}
+type _Zigs__Builder struct {
+	d *Zigs
+	// TODO
+}
+
+// -- the rackerized builders -->
+
+type _Stroct__Racker struct {
+	_Stroct__Builder // most methods come from this, but child-builder getters will be overriden.
+
+	cb_foo  _Strooct__Racker // provides child builder for field 'foo'.
+	frz_foo bool             // if true, must never yield cb_foo again.  becomes true on cb_foo.Build *or* assignment to field (latter case reachable if the value was made without going through cb_foo).
+}
+type _Strooct__Racker struct {
+	_Strooct__Builder // most methods come from this, but child-builder getters will be overriden.
+
+	// TODO: we might still actually need builders for scalars.  if it's got a wrapper struct, it would incur boxing.  damnit.
+	zems _Strems__Racker
+	zigs _Zigs__Racker
+	zee  _Zahn__Racker
+}
+type _Strems__Racker struct {
+	// TODO didn't finish
+}
+type _Zigs__Racker struct {
+	// TODO didn't finish
+}
+type _Zahn__Racker struct {
+	// TODO didn't finish
+}
+
+// right, here's one wild ride we haven't addressed yet:
+// if you build a thing that resides in racker-operated memory, you get a node.
+// so far so good, and you can even use it multiple places.
+// if assignments go through a Maybe struct?
+// ... actually, this is all fine.
+// the 'MaybeFoo' structs should store pointers to the thing.  done.
+// if the thing originated in racker-operated memory, this is free;
+// if it didn't, it's a cost you would've hit somewhere else already anyway too.
+// done.  it's fine.
--- a/_rsrch/microbench/multihoisting/soln/soln_test.go
+++ b/_rsrch/microbench/multihoisting/soln/soln_test.go
+package solution
+
+import (
+	"fmt"
+	"runtime"
+	"testing"
+)
+
+func init() {
+	runtime.GOMAXPROCS(1) // necessary if we want to do precise accounting on runtime.ReadMemStats.
+}
+
+var sink interface{}
+
+func TestAllocCount(t *testing.T) {
+	memUsage := func(m1, m2 *runtime.MemStats) {
+		fmt.Println(
+			"Alloc:", m2.Alloc-m1.Alloc,
+			"TotalAlloc:", m2.TotalAlloc-m1.TotalAlloc,
+			"HeapAlloc:", m2.HeapAlloc-m1.HeapAlloc,
+			"Mallocs:", m2.Mallocs-m1.Mallocs,
+			"Frees:", m2.Frees-m1.Frees,
+		)
+	}
+	var m [99]runtime.MemStats
+	runtime.GC()
+	runtime.GC() // i know not why, but as of go-1.13.3, and not in go-1.12.5, i have to call this twice before we start to get consistent numbers.
+	runtime.ReadMemStats(&m[0])
+
+	var x Node
+	x = &Stroct{}
+	runtime.GC()
+	runtime.ReadMemStats(&m[1])
+
+	x = x.LookupString("foo")
+	runtime.GC()
+	runtime.ReadMemStats(&m[2])
+
+	sink = x
+	runtime.GC()
+	runtime.ReadMemStats(&m[3])
+
+	sink = nil
+	runtime.GC()
+	runtime.ReadMemStats(&m[4])
+	memUsage(&m[0], &m[1])
+	memUsage(&m[0], &m[2])
+	memUsage(&m[0], &m[3])
+	memUsage(&m[0], &m[4])
+}
--- a/schema/gen/go/HACKME_memorylayout.md
+++ b/schema/gen/go/HACKME_memorylayout.md
+about memory layout
+===================
+
+Memory layout is important when designing a system for going fast.
+It also shows up in exported types (whether or not they're pointers, etc).
+
+For the most part, we try to hide these details;
+or, failing that, at least make them appear consistent.
+There's some deeper logic required to *pick* which way we do things, though.
+
+
+Prerequiste understandings
+--------------------------
+
+The following headings contain brief summaries of information that's important
+to know in order to understand how we designed the IPLD data structure
+memory layouts (and how to tune them).
+
+Most of these concepts are common to many programming languages, so you can
+likely skim those sections if you know them.  Others are fairly golang-specific.
+
+### heap vs stack
+
+The concept of heap vs stack in Golang is pretty similar to the concept
+in most other languages with garbage collection, so we won't cover it
+in great detail here.
+
+The key concept to know: the *count* of allocations which are made on
+the heap significantly affects performance.  Allocations on the heap
+consume CPU time both when made, and later, as part of GC.
+
+The *size* of the allocations affects the total memory needed, but
+does *not* significantly affect the speed of execution.
+
+Allocations which are made on the stack are (familiarly) effectively free.
+
+### escape analysis
+
+"Escape Analysis" refers to the efforts the compiler makes to figure out if some
+piece of memory can be kept on the stack or if it must "escape" to the heap.
+If escape analysis finds that some memory can be kept on the stack,
+it will prefer to do so (and this is faster/preferable because it both means
+allocation is simple and that no 'garbage' is generated to collect later).
+
+Since whether things are allocated on the stack or the heap affects performance,
+the concept of escape analysis is important.  The details (fortunately) are not:
+For the purposes of what we need to do in in our IPLD data structures,
+our goal with our code is to A) flunk out and escape to heap
+as soon as possible, but B) do that in one big chunk of memory at once
+(because we'll be able to use [internal pointers](#internal-pointers)
+thereafter).
+
+One implication of escape analysis that's both useful and easy to note is that
+whether or not you use a struct literal (`Foo{}`) or a pointer (`&Foo{}`)
+*does not determine* whether that memory gets allocated on the heap or stack.
+If you use a pointer, the escape analysis can still prove that the pointer
+never escapes, it will still end up allocated on the stack.
+
+Another way to thing about this is: use pointers freely!  By using pointers,
+you're in effect giving the compiler *more* freedom to decide where memory resides;
+in contrast, avoiding the use of pointers in method signitures, etc, will
+give the compiler *less* choice about where the memory should reside,
+and typically forces copying.  Giving the compiler more freedom generally
+has better results.
+
+**pro-tip**: you can compile a program with the arguments `-gcflags "-m -m"` to
+get lots of information about the escape analysis the compiler performs.
+
+### embed vs pointer
+
+Structs can be embeded -- e.g. `type Foo struct { field Otherstruct }` --
+or referenced by a pointer -- e.g. `type Foo struct { field *Otherstruct }`.
+
+The difference is substantial.
+
+When structs are embedded, the layout in memory of the larger struct is simply
+a concatenation of the embedded structs.  This means the amount of memory
+that structure takes is the sum of the size of all of the embedded things;
+and by the other side of the same coint, the *count* of allocations needed
+(remember! the *count* affects performance more than the *size*, as we briefly
+discussed in the [heap-vs-stack](#heap-vs-stack) section) is exactly *one*.
+
+When pointers are used instead of embedding, the parent struct is typically
+smaller (pointers are one word of memory, whereas the embedded thing can often
+be larger), and null values can be used... but if fields are assigned to some
+other value than null, there's a very high likelihood that heap allocations
+will start cropping up in the process of creating values to take pointers
+to before then assigning the pointer field!  (This can be subverted by
+either [escape analysis](#escape-analysis) (though it's fairly uncommon),
+or by [internal pointers](#internal-pointers) (which are going to turn out
+very important, and will be discussed later)... but it's wise to default
+to worrying about it until you can prove that one of the two will save you.)
+
+When setting fields, another difference appears: a pointer field takes one
+instruction (assuming the value already exists, and we're not invoking heap
+allocation to get the pointer!) to assign,
+whereas an embedded field generally signifies a memcopy, which
+may take several instructions if the embedded value is large.
+
+You can see how the choice between use of pointers and embeds results
+in significantly different memory usage and performance characteristics!
+
+(Quick mention in passing: "cache lines", etc, are also potential concerns that
+can be addressed by embedding choices.  However, it's probably wise to attend
+to GC first.  While cache alignment *can* be important, it's almost always going
+to be a winning bet that GC will be a much higher impact concern.)
+
+It is an unfortunate truth that whether or not a field can be null in Golang
+and whether or not it's a pointer are two properties that are conflated --
+you can't choose one independently of the other.  (The reasoning for this is
+based on intuitions around mechanical sympathy -- but it's worth mentioning that
+a sufficiently smart compiler *could* address both the logical separation
+and simultaneously have the compiler solve for the mechanical sympathy concerns
+in order to reach good performance in many cases; Golang just doesn't do so.)
+
+### interfaces are two words and may cause implicit allocation
+
+Interfaces in Golang are always two words in size.  The first word is a pointer
+to the type information for what the interface contains.  The second word is
+a pointer to the data itself.
+
+This means if some data is assigned into an interface value, it *must* become
+a pointer -- the compiler will do this implicitly; and this is the case even if
+the type info in the first word retains a claim that the data is not a pointer.
+In practice, this also almost guarantees in practice that the data in question
+will escape to the heap.
+
+(This applies even to primitives that are one word in size!  At least, as of
+golang version 1.13 -- keep an eye on on the `runtime.convT32` functions
+if you want to look into this further; the `mallocgc` call is clear to see.
+There's a special case inside `malloc` which causes zero values to get a
+free pass (!), but in all other cases, allocation will occur.)
+
+Knowing this, you probably can conclude a general rule of thumb: if your
+application is going to put a value in an interface, and *especially* if it's
+going to do that more than once, you're probably best off explicitly handling
+it as a pointer rather than a value.  Any other approach wil be very likely to
+provoke unnecessary copy behavior and/or multiple unnecessary heap allocations
+as the value moves in and out of pointer form.
+
+(Fun note: if attempting to probe this by microbenchmarking experiments, be
+careful to avoid using zero values!  Zero values get special treatment and avoid
+allocations in ways that aren't general.)
+
+### internal pointers
+
+"Internal pointers" refer to any pointer taken to some position in a piece
+of memory that was already allocated somewhere.
+
+For example, given some `type Foo struct { a, b, c Otherstruct }`, the
+value of `f := &Foo{}` and `b := &f.b` will be very related: they will
+differ by the size of `Otherstruct`!
+
+The main consequence of this is: using internal pointers can allow you to
+construct large structure containing many pointers... *without* using a
+correspondingly large *count of allocations*.  This unlocks a lot of potential
+choices for how to build data structures in memory while minimizing allocs!
+
+Internal pointers are not without their tradeoffs, however: in particular,
+internal pointers have an interesting relationship with garbage collection.
+When there's an internal pointer to some field in a large struct, that pointer
+will cause the *entire* containing struct to be still considered to be
+referenced for garbage collection purposes -- that is, *it won't be collected*.
+So, in our example above, keeping a reference to `&f.b` will in fact cause
+memory of the size of *three* `Otherstruct`s to be uncollectable, not one.
+
+You can find more information about internal pointers in this talk:
+https://blog.golang.org/ismmkeynote
+
+### inlining functions
+
+Function inlining is an important compiler optimization.
+
+Inlining optimizes in two regards: one, can remove some of the overhead of
+function calls; and two, it can enable *other* optimizations by getting the
+relevant instruction blocks to be located together and thus rearrangable.
+(Inlining does increase the compiled binary size, so it's not all upside.)
+
+Calling a function has some fixed overhead -- shuffling arguments from registers
+into calling convention order on the stack; potentially growing the stack; etc.
+While these overheads are small in practice... if the function is called many
+(many) times, this overhead can still add up.  Inlining can remove these costs!
+
+More interestingly, function inlining can also enable *other* optimizations.
+For example, a function that *would* have caused escape analysis to flunk
+something out to the heap *if* that function as called was alone... can
+potentially be inlined in such a way that in its contextual usage,
+the escape analysis flunking can actually disappear entirely.
+Many other kinds of optimizations can similarly be enabled by inlining.
+This makes designing library code to be inline-friendly a potentially
+high-impact concern -- sometimes even more so than can be easily seen.
+
+The exact mechanisms used by the compiler to determine what can (and should)
+be inlined
+
+### virtual function calls
+
+Function calls which are intermediated by interfaces are called "virtual"
+function calls.  (You may also encounter the term "v-table" in compiler
+and runtime design literature -- this 'v' stands for "virtual".)
+
+Virtual function calls generally can't be inlined.  This can have significant
+effects, as described in the [inlining functions](#inlining-functions) section --
+it both means function call overhead can't be removed, and it can have cascading
+consequences by making other potential optimizations unreachable.
+
+
+
+Resultant Design Features
+-------------------------
+
+### concrete implementations
+
+We generate a concrete type for each type in the schema.
+
+Using a concrete type means methods on it are possible to inline.
+This is interesting because most of the methods are "accessors" -- that is,
+a style of function that has a small body and does little work -- and these
+are precisely the sort of function where inlining can add up.
+
+There is one one downside to using an exported concrete type (rather than
+keeping it unexported and hidden behind and exported interface):
+it means any code external to the package can produce Golang's natural "zero"
+for the type.  This is problematic because it's true even if the Golang "zero"
+value for the type doesn't correspond to a valid value.
+This is an unfortunate but overall practical tradeoff.
+
+### embed by default
+
+Embedded structs amortizes the count of memory allocations.
+This addresses what is typically our biggest concern.
+
+The increase in size is generally not consequential.  We expect most fields
+end up filled anyway, so reserving that memory up front is reasonable.
+(Indeed, unfilled fields are only possible for nullable or optional fields
+which are implemented as embedded.)
+
+Assignment into embedded fields may have the cost of a memcopy.
+(By contrast, if fields were pointers, assigning them would be cheap...
+though at the same time, we would've had to pay the allocation cost, elsewhere.)
+However, combined with (other tricks)[#child-nodebuilders-point-into-embedded-fields],
+a shortcut becomes possible: if we at some point used shared memory as the
+scratch space for the child nodebuilder... and it's since been finalized...
+and that very same pointer (into ourselves!) is now being assigned to us...
+we can cheaply detect that and fastpath it.  (This sounds contrived, but it's
+actually the common case.)
+
+### nullable and optional struct fields embed too
+
+TODO intro
+
+There is some chance of over-allocation in the event of nullable or optional
+fields.  We support tuning that via adjunct configuration to the code generator
+which allows you to opt in to using pointers for fields; choosing to do this
+will of course cause you to lose out on alloc amortization features in exchange.
+
+TODO also resolve the loops note, at bottom
+
+### nodebuilders point to the concrete type
+
+We generate NodeBuilder types which contain a pointer to the type they build.
+
+This means a single NodeBuilder and its produced Node will require
+**two** allocations -- one for the NodeBuilder, and a separate one for the Node.
+
+An alternative would be to embed the concrete Node value in the NodeBuilder,
+and return a pointer to when finalizing the creation of the Node;
+however, because due to the garbage collection semantics around
+[internal pointers](#internal-pointers), such a design would cause the entirety
+of the memory needed in the NodeBuilder to remain uncollectable as long as
+completed Node is reachable!  This would be an unfortunate trade --
+we can do better, and will... via [racking builders](#racking-builders).
+
+### child nodebuilders point into embedded fields
+
+TODO this is the critical secret sauce
+
+### racking builders
+
+(This where things start to get decidedly less-than-obvious.)
+
+After generating the NodeBuilder for each type, we **additionally** generate
+a "racker" type.  This "racker" is a struct which embeds the NodeBuilder...
+and the racker (and thus NodeBuilder) for each of the fields within a struct.
+This lets us amortize the allocations for all the *builders* in the same way
+as embedding in the actual value structs let us amortized allocations there.
+
+With racking builders, we can amortize all the allocations of working memory
+needed for a whole family of NodeBuilders... **and** amortize all the
+allocations for the value structures into a second allocation...
+and that's it, it's just those two.  Further more, the separation means that
+once the construction of the Node is done, we can drop all the NodeBuilder
+memory and expect it to be immediately garbage collectable.  Win!
+
+The code for this gets a little complex, and the result also carries several
+additional limitations to the usage, but it does keep the allocations finite,
+and thus makes the overall performance fast.
+
+### visibility rules
+
+It's perfectly fine to let builders accumulate mutations... right up until
+the moment where a Node is returned.
+
+(While it's less than ideal that different nodebuilders might interact with
+each other... it's technically not a violation of terms: the one final
+concern is whether or not Node immutablity is violated.  Experiencing
+spooky-action-at-a-distance between NodeBuilder instances is irrelevant.)
+
+So, we reach the following rules:
+
+- when a NodeBuilder.Build method returns a Node, that memory must be frozen:
+	- that NodeBuilder of course sets its target pointer to nil, jamming itself;
+	- no other set methods on the *parent* NodeBuilder may assign to that field;
+	- and the *parent* NodeBuilder may never return another child NodeBuilder for that field.
+
+This set of rules around visibility lets us do amortized allocations
+of a big hunk of working memory, and still comply with the familiar
+small-pieces-first creation model of the NodeBuilder interface
+by returning piecemeal read-only pointers into that big amortized memory hunk.
+
+In order to satisfy these rules (namely, ensuring we never return a NodeBuilder
+that addresses memory that's already been frozen) -- and do so without
+consuming linearly more memory to track it! -- maps and lists end up with some
+notable limitations:
+
+- Lists can only be appended linearly, not populated in free order.
+  (This means we can condense the 'isFrozen' attribute to an int offset.)
+- Maps can only build one new value at a time.
+- Structs need no special handling -- they can still be regarded in any order.
+  (We know how much memory we need at compile time, so we can swallow that.)
+
+
+
+Amusing Details and Edge Cases
+------------------------------
+
+### looped references
+
+// who's job is it to detect this?
+// the schema validator should check it...
+// but something that breaks the cycle *there* doesn't necessarily do so for the emitted code!  aggh!
+//  ... unless we go back to optional and nullable both making ptrs unconditionally.