Commit 02b9b4a9 authored by Eric Myhre's avatar Eric Myhre

Deeper prototyping of 'racker' solution, and docs.

I'm still scratching out prototypes in another directory to make it
easier to focus on the details I'm interested in rather than get
wrapped up with the kerfuffling details of the existing full
interfaces... as well as easier to try out different interfaces.
And at some point, of course, we want *codegen* for these things, while
what I'm plunking about with here is a sketch of the expected *output*.
So, this is a large step removed from the final code... but it's easier
to sketch this way and "imagine" where the templating can later appear.

This solution approach seems to be going largely well.

And we have some tests which count allocs very carefully to confirm
the desired result!

Still a lot to go.  There's large hunks of comments and unresolved
details still in this commit; I just need a checkpoint.

Some things around creation of maps are still looking tricky to
support optimizations for.  Research needed there.  A comment hunk
describes the questions, but so far there's no tradeoff-free answer.

Perhaps most usefully: here's also a checkpoint of the
"HACKME_memorylayout" document!  It also still has a few todos, but
it's the most comprehensive block of prose I've got in one place
so far.  Hopefully it will be useful reading for anyone else wanting
to get up to speed on the in-depth in "why" -- it's awfully,
awfully hard to infer this kind of thing from the eschatology of
just observing where pointers are in a finished product!
Signed-off-by: default avatarEric Myhre <hash@exultant.us>
parent 38aa8b7c
package solution
type Node interface {
LookupString(key string) Node
}
type NodeBuilder interface {
InsertByString(key string, value Node)
Build() Node
}
func (n *Stroct) LookupString(key string) Node {
switch key {
case "foo":
return &n.foo
case "bar":
return &n.bar
default:
panic("no")
}
}
func (n *Strooct) LookupString(key string) Node {
panic("nyi")
}
func (n *String) LookupString(key string) Node {
panic("nyi")
}
func NewStroctBuilder() NodeBuilder {
r := _Stroct__Racker{}
return &r._Stroct__Builder
}
func (nb *_Stroct__Builder) InsertByString(key string, value Node) {
switch key {
case "foo":
if nb.isset_foo {
panic("cannot set field repeatedly!") // surprisingly, could make this optional without breaking memviz semantics.
}
if &nb.d.foo != value.(*Strooct) { // shortcut: if shmem, no need to memcopy self to self!
nb.d.foo = *value.(*Strooct) // REVIEW: maybe this isn't necessary here and only should be checked in the racker?
}
// interestingly, no need to set br.frz_foo... nb.b isn't revealed yet, so, while
nb.isset_foo = true
case "bar":
nb.d.bar = *value.(*String)
nb.isset_bar = true
default:
panic("no")
}
}
func (br *_Stroct__Racker) InsertByString(key string, value Node) {
switch key {
case "foo":
if &br.d.foo == value.(*Strooct) { // not just a shortcut: if shmem, this branch insert is ONLY possible WHEN the field is frozen.
br.isset_foo = true // FIXME straighten this out to dupe less plz?! also should panic if already set, consistently (or not).
}
if br.frz_foo {
panic("cannot set field which has been frozen due to shared immutable memory!")
}
// TODO finish and sanity check in morning.
// ... seems like there's a lot less shared code than i thought.
// ... resultingly, i'm seriously questioning if builders and rackers deserve separate types.
// ... what's that for? reducing the (fairly temporary) size of builders if used sans racking style? does that... matter?
// no, by the morning light i think this can in fact just be simplified a lot and code share will emerge.
default:
panic("no")
}
}
func (nb *_Stroct__Builder) Build() Node {
return nb.d
}
func (br *_Stroct__Racker) Build() Node {
// TODO freeze something. but what? do i really need a pointer "up" that to be fed into here?
// is that even a pointer? if it depends on what this value is contained in... remember, for maps and lists, it's different than structs: bools ain't it.
// if we try to use a function pointer here: A) you're kidding.. that will not be fast.. B) check to see if it causes fucking allocs, because it fucking might, something about grabbing methods causing closures.
// AH HA. Obrigado!:
// We can actually use the invalidation of the 'd' pointer as our signal, and do the freezing update the next time either Set or GetBuilderForValue is called.
// For structs, since we'll keep a builder/racker per field, that's enough state already. You'll be able to get the childbuilders for multiple fields simultaneously (though you shouldn't rely on it, in general).
// For lists, there will be a frozen offset and an isset offset. The former will be able to advance well beyond the latter. You'll only be allowed to have one childbuilder at a time.
// For maps, we'll still do copies and allocs (reshuffling maps with large embeds is its own cost center usually better avoided). You'll only be able to use one keybuilder at a time (keys will be copied when done), and one value builder (but each use of the value builder will incur an alloc for its 'd' field).
// And here we hit one more piece of Fun that might require a small break to current interfaces: in order to make it possible for keys to reuse a swap space and not get shifted to heap... we can't return them.
// Which means no ChildBuilderForKey method at all: that interface doesn't fly.
// Oh dear. This is unpleasant. More thought needed on this.
// ... Hang on. Have I imagined this more complicated than it is?
// For structs, the key is always a string.
// For maps with enum keys... always a string. (...I don't know how to say enums of int repr would work; I don't think they do, Because Data Model rules.)
// For maps with union keys... okay, we're gonna require that those act like struct keys (unions used in keys will have to have nonptr internals; a fun detail).
// For maps with struct keys... okay, sure it has to be a string from repr land... but also yes, we have to accept the mappy form, for typed level copies and transforms.
// Yup. That last bit means a need for swap space while assembling it. Which we don't want to be forced to return as a node, because that would cause a heap alloc. Which we'd immediately undo since map keys need to be by value to behave correctly. Ow.
// I... don't know what to do about this. We could have the Build method flat out return a nil, and just "document" that. But it's pretty unappealing to make such an edge case.
// Also, yeah, we really do need an InsertByString method on MapBuilder. Incurring some nonsense boxing for string keys in structs is laughable.
// If you're thinking a workaroudn such as having a single swap space for building a single justString key for temporary use would help... no, sadly: "single swap space" plus visibility model won't jive like that.
// (And even if it did, ever asking an end user to write that much boilerplate is still pretty crass... as well as easily avoidable for minimal library code size cost.)
return br.d
}
package solution
// -- ipld schema -->
/*
type Stroct struct {
foo Strooct
bar String
}
type Strooct struct {
zot String
zam String
zems Strems
zigs Zigs
zee Zahn
}
type Strems [String]
type Zigs {String:Zahn}
type Zahn struct {
bahn String
}
*/
// -- the readable types -->
type Stroct struct {
foo Strooct
bar String
}
type Strooct struct {
zot String
zam String
zems Strems
zigs Zigs
zee Zahn
}
type String struct {
x string
}
type Strems struct {
x []String
}
type Zigs struct {
x map[String]Zahn
}
type Zahn struct {
bahn String
}
// -- the builders alone -->
type _Stroct__Builder struct {
d *Stroct // this pointer aims into the thing we're building (it's as yet unrevealed). it will be nil'd when we reveal it.
isset_foo bool
isset_bar bool
}
type _Strooct__Builder struct {
d *Strooct
isset_zot bool
isset_zam bool
isset_zems bool
isset_zigs bool
isset_zee bool
}
type _Strems__Builder struct {
d *Strems
// TODO
}
type _String__Builder struct {
// okay, this one is a gimme: data contains only a ptr itself, effectively.
// TODO: still might need a 'd' pointer, in case you're assigning into a list and wanna save boxing allocs?
// ... so long as we're doing wrapper types (to block blind casting), we're gonna have those boxing alloc concerns.
}
type _Zigs__Builder struct {
d *Zigs
// TODO
}
// -- the rackerized builders -->
type _Stroct__Racker struct {
_Stroct__Builder // most methods come from this, but child-builder getters will be overriden.
cb_foo _Strooct__Racker // provides child builder for field 'foo'.
frz_foo bool // if true, must never yield cb_foo again. becomes true on cb_foo.Build *or* assignment to field (latter case reachable if the value was made without going through cb_foo).
}
type _Strooct__Racker struct {
_Strooct__Builder // most methods come from this, but child-builder getters will be overriden.
// TODO: we might still actually need builders for scalars. if it's got a wrapper struct, it would incur boxing. damnit.
zems _Strems__Racker
zigs _Zigs__Racker
zee _Zahn__Racker
}
type _Strems__Racker struct {
// TODO didn't finish
}
type _Zigs__Racker struct {
// TODO didn't finish
}
type _Zahn__Racker struct {
// TODO didn't finish
}
// right, here's one wild ride we haven't addressed yet:
// if you build a thing that resides in racker-operated memory, you get a node.
// so far so good, and you can even use it multiple places.
// if assignments go through a Maybe struct?
// ... actually, this is all fine.
// the 'MaybeFoo' structs should store pointers to the thing. done.
// if the thing originated in racker-operated memory, this is free;
// if it didn't, it's a cost you would've hit somewhere else already anyway too.
// done. it's fine.
package solution
import (
"fmt"
"runtime"
"testing"
)
func init() {
runtime.GOMAXPROCS(1) // necessary if we want to do precise accounting on runtime.ReadMemStats.
}
var sink interface{}
func TestAllocCount(t *testing.T) {
memUsage := func(m1, m2 *runtime.MemStats) {
fmt.Println(
"Alloc:", m2.Alloc-m1.Alloc,
"TotalAlloc:", m2.TotalAlloc-m1.TotalAlloc,
"HeapAlloc:", m2.HeapAlloc-m1.HeapAlloc,
"Mallocs:", m2.Mallocs-m1.Mallocs,
"Frees:", m2.Frees-m1.Frees,
)
}
var m [99]runtime.MemStats
runtime.GC()
runtime.GC() // i know not why, but as of go-1.13.3, and not in go-1.12.5, i have to call this twice before we start to get consistent numbers.
runtime.ReadMemStats(&m[0])
var x Node
x = &Stroct{}
runtime.GC()
runtime.ReadMemStats(&m[1])
x = x.LookupString("foo")
runtime.GC()
runtime.ReadMemStats(&m[2])
sink = x
runtime.GC()
runtime.ReadMemStats(&m[3])
sink = nil
runtime.GC()
runtime.ReadMemStats(&m[4])
memUsage(&m[0], &m[1])
memUsage(&m[0], &m[2])
memUsage(&m[0], &m[3])
memUsage(&m[0], &m[4])
}
about memory layout
===================
Memory layout is important when designing a system for going fast.
It also shows up in exported types (whether or not they're pointers, etc).
For the most part, we try to hide these details;
or, failing that, at least make them appear consistent.
There's some deeper logic required to *pick* which way we do things, though.
Prerequiste understandings
--------------------------
The following headings contain brief summaries of information that's important
to know in order to understand how we designed the IPLD data structure
memory layouts (and how to tune them).
Most of these concepts are common to many programming languages, so you can
likely skim those sections if you know them. Others are fairly golang-specific.
### heap vs stack
The concept of heap vs stack in Golang is pretty similar to the concept
in most other languages with garbage collection, so we won't cover it
in great detail here.
The key concept to know: the *count* of allocations which are made on
the heap significantly affects performance. Allocations on the heap
consume CPU time both when made, and later, as part of GC.
The *size* of the allocations affects the total memory needed, but
does *not* significantly affect the speed of execution.
Allocations which are made on the stack are (familiarly) effectively free.
### escape analysis
"Escape Analysis" refers to the efforts the compiler makes to figure out if some
piece of memory can be kept on the stack or if it must "escape" to the heap.
If escape analysis finds that some memory can be kept on the stack,
it will prefer to do so (and this is faster/preferable because it both means
allocation is simple and that no 'garbage' is generated to collect later).
Since whether things are allocated on the stack or the heap affects performance,
the concept of escape analysis is important. The details (fortunately) are not:
For the purposes of what we need to do in in our IPLD data structures,
our goal with our code is to A) flunk out and escape to heap
as soon as possible, but B) do that in one big chunk of memory at once
(because we'll be able to use [internal pointers](#internal-pointers)
thereafter).
One implication of escape analysis that's both useful and easy to note is that
whether or not you use a struct literal (`Foo{}`) or a pointer (`&Foo{}`)
*does not determine* whether that memory gets allocated on the heap or stack.
If you use a pointer, the escape analysis can still prove that the pointer
never escapes, it will still end up allocated on the stack.
Another way to thing about this is: use pointers freely! By using pointers,
you're in effect giving the compiler *more* freedom to decide where memory resides;
in contrast, avoiding the use of pointers in method signitures, etc, will
give the compiler *less* choice about where the memory should reside,
and typically forces copying. Giving the compiler more freedom generally
has better results.
**pro-tip**: you can compile a program with the arguments `-gcflags "-m -m"` to
get lots of information about the escape analysis the compiler performs.
### embed vs pointer
Structs can be embeded -- e.g. `type Foo struct { field Otherstruct }` --
or referenced by a pointer -- e.g. `type Foo struct { field *Otherstruct }`.
The difference is substantial.
When structs are embedded, the layout in memory of the larger struct is simply
a concatenation of the embedded structs. This means the amount of memory
that structure takes is the sum of the size of all of the embedded things;
and by the other side of the same coint, the *count* of allocations needed
(remember! the *count* affects performance more than the *size*, as we briefly
discussed in the [heap-vs-stack](#heap-vs-stack) section) is exactly *one*.
When pointers are used instead of embedding, the parent struct is typically
smaller (pointers are one word of memory, whereas the embedded thing can often
be larger), and null values can be used... but if fields are assigned to some
other value than null, there's a very high likelihood that heap allocations
will start cropping up in the process of creating values to take pointers
to before then assigning the pointer field! (This can be subverted by
either [escape analysis](#escape-analysis) (though it's fairly uncommon),
or by [internal pointers](#internal-pointers) (which are going to turn out
very important, and will be discussed later)... but it's wise to default
to worrying about it until you can prove that one of the two will save you.)
When setting fields, another difference appears: a pointer field takes one
instruction (assuming the value already exists, and we're not invoking heap
allocation to get the pointer!) to assign,
whereas an embedded field generally signifies a memcopy, which
may take several instructions if the embedded value is large.
You can see how the choice between use of pointers and embeds results
in significantly different memory usage and performance characteristics!
(Quick mention in passing: "cache lines", etc, are also potential concerns that
can be addressed by embedding choices. However, it's probably wise to attend
to GC first. While cache alignment *can* be important, it's almost always going
to be a winning bet that GC will be a much higher impact concern.)
It is an unfortunate truth that whether or not a field can be null in Golang
and whether or not it's a pointer are two properties that are conflated --
you can't choose one independently of the other. (The reasoning for this is
based on intuitions around mechanical sympathy -- but it's worth mentioning that
a sufficiently smart compiler *could* address both the logical separation
and simultaneously have the compiler solve for the mechanical sympathy concerns
in order to reach good performance in many cases; Golang just doesn't do so.)
### interfaces are two words and may cause implicit allocation
Interfaces in Golang are always two words in size. The first word is a pointer
to the type information for what the interface contains. The second word is
a pointer to the data itself.
This means if some data is assigned into an interface value, it *must* become
a pointer -- the compiler will do this implicitly; and this is the case even if
the type info in the first word retains a claim that the data is not a pointer.
In practice, this also almost guarantees in practice that the data in question
will escape to the heap.
(This applies even to primitives that are one word in size! At least, as of
golang version 1.13 -- keep an eye on on the `runtime.convT32` functions
if you want to look into this further; the `mallocgc` call is clear to see.
There's a special case inside `malloc` which causes zero values to get a
free pass (!), but in all other cases, allocation will occur.)
Knowing this, you probably can conclude a general rule of thumb: if your
application is going to put a value in an interface, and *especially* if it's
going to do that more than once, you're probably best off explicitly handling
it as a pointer rather than a value. Any other approach wil be very likely to
provoke unnecessary copy behavior and/or multiple unnecessary heap allocations
as the value moves in and out of pointer form.
(Fun note: if attempting to probe this by microbenchmarking experiments, be
careful to avoid using zero values! Zero values get special treatment and avoid
allocations in ways that aren't general.)
### internal pointers
"Internal pointers" refer to any pointer taken to some position in a piece
of memory that was already allocated somewhere.
For example, given some `type Foo struct { a, b, c Otherstruct }`, the
value of `f := &Foo{}` and `b := &f.b` will be very related: they will
differ by the size of `Otherstruct`!
The main consequence of this is: using internal pointers can allow you to
construct large structure containing many pointers... *without* using a
correspondingly large *count of allocations*. This unlocks a lot of potential
choices for how to build data structures in memory while minimizing allocs!
Internal pointers are not without their tradeoffs, however: in particular,
internal pointers have an interesting relationship with garbage collection.
When there's an internal pointer to some field in a large struct, that pointer
will cause the *entire* containing struct to be still considered to be
referenced for garbage collection purposes -- that is, *it won't be collected*.
So, in our example above, keeping a reference to `&f.b` will in fact cause
memory of the size of *three* `Otherstruct`s to be uncollectable, not one.
You can find more information about internal pointers in this talk:
https://blog.golang.org/ismmkeynote
### inlining functions
Function inlining is an important compiler optimization.
Inlining optimizes in two regards: one, can remove some of the overhead of
function calls; and two, it can enable *other* optimizations by getting the
relevant instruction blocks to be located together and thus rearrangable.
(Inlining does increase the compiled binary size, so it's not all upside.)
Calling a function has some fixed overhead -- shuffling arguments from registers
into calling convention order on the stack; potentially growing the stack; etc.
While these overheads are small in practice... if the function is called many
(many) times, this overhead can still add up. Inlining can remove these costs!
More interestingly, function inlining can also enable *other* optimizations.
For example, a function that *would* have caused escape analysis to flunk
something out to the heap *if* that function as called was alone... can
potentially be inlined in such a way that in its contextual usage,
the escape analysis flunking can actually disappear entirely.
Many other kinds of optimizations can similarly be enabled by inlining.
This makes designing library code to be inline-friendly a potentially
high-impact concern -- sometimes even more so than can be easily seen.
The exact mechanisms used by the compiler to determine what can (and should)
be inlined
### virtual function calls
Function calls which are intermediated by interfaces are called "virtual"
function calls. (You may also encounter the term "v-table" in compiler
and runtime design literature -- this 'v' stands for "virtual".)
Virtual function calls generally can't be inlined. This can have significant
effects, as described in the [inlining functions](#inlining-functions) section --
it both means function call overhead can't be removed, and it can have cascading
consequences by making other potential optimizations unreachable.
Resultant Design Features
-------------------------
### concrete implementations
We generate a concrete type for each type in the schema.
Using a concrete type means methods on it are possible to inline.
This is interesting because most of the methods are "accessors" -- that is,
a style of function that has a small body and does little work -- and these
are precisely the sort of function where inlining can add up.
There is one one downside to using an exported concrete type (rather than
keeping it unexported and hidden behind and exported interface):
it means any code external to the package can produce Golang's natural "zero"
for the type. This is problematic because it's true even if the Golang "zero"
value for the type doesn't correspond to a valid value.
This is an unfortunate but overall practical tradeoff.
### embed by default
Embedded structs amortizes the count of memory allocations.
This addresses what is typically our biggest concern.
The increase in size is generally not consequential. We expect most fields
end up filled anyway, so reserving that memory up front is reasonable.
(Indeed, unfilled fields are only possible for nullable or optional fields
which are implemented as embedded.)
Assignment into embedded fields may have the cost of a memcopy.
(By contrast, if fields were pointers, assigning them would be cheap...
though at the same time, we would've had to pay the allocation cost, elsewhere.)
However, combined with (other tricks)[#child-nodebuilders-point-into-embedded-fields],
a shortcut becomes possible: if we at some point used shared memory as the
scratch space for the child nodebuilder... and it's since been finalized...
and that very same pointer (into ourselves!) is now being assigned to us...
we can cheaply detect that and fastpath it. (This sounds contrived, but it's
actually the common case.)
### nullable and optional struct fields embed too
TODO intro
There is some chance of over-allocation in the event of nullable or optional
fields. We support tuning that via adjunct configuration to the code generator
which allows you to opt in to using pointers for fields; choosing to do this
will of course cause you to lose out on alloc amortization features in exchange.
TODO also resolve the loops note, at bottom
### nodebuilders point to the concrete type
We generate NodeBuilder types which contain a pointer to the type they build.
This means a single NodeBuilder and its produced Node will require
**two** allocations -- one for the NodeBuilder, and a separate one for the Node.
An alternative would be to embed the concrete Node value in the NodeBuilder,
and return a pointer to when finalizing the creation of the Node;
however, because due to the garbage collection semantics around
[internal pointers](#internal-pointers), such a design would cause the entirety
of the memory needed in the NodeBuilder to remain uncollectable as long as
completed Node is reachable! This would be an unfortunate trade --
we can do better, and will... via [racking builders](#racking-builders).
### child nodebuilders point into embedded fields
TODO this is the critical secret sauce
### racking builders
(This where things start to get decidedly less-than-obvious.)
After generating the NodeBuilder for each type, we **additionally** generate
a "racker" type. This "racker" is a struct which embeds the NodeBuilder...
and the racker (and thus NodeBuilder) for each of the fields within a struct.
This lets us amortize the allocations for all the *builders* in the same way
as embedding in the actual value structs let us amortized allocations there.
With racking builders, we can amortize all the allocations of working memory
needed for a whole family of NodeBuilders... **and** amortize all the
allocations for the value structures into a second allocation...
and that's it, it's just those two. Further more, the separation means that
once the construction of the Node is done, we can drop all the NodeBuilder
memory and expect it to be immediately garbage collectable. Win!
The code for this gets a little complex, and the result also carries several
additional limitations to the usage, but it does keep the allocations finite,
and thus makes the overall performance fast.
### visibility rules
It's perfectly fine to let builders accumulate mutations... right up until
the moment where a Node is returned.
(While it's less than ideal that different nodebuilders might interact with
each other... it's technically not a violation of terms: the one final
concern is whether or not Node immutablity is violated. Experiencing
spooky-action-at-a-distance between NodeBuilder instances is irrelevant.)
So, we reach the following rules:
- when a NodeBuilder.Build method returns a Node, that memory must be frozen:
- that NodeBuilder of course sets its target pointer to nil, jamming itself;
- no other set methods on the *parent* NodeBuilder may assign to that field;
- and the *parent* NodeBuilder may never return another child NodeBuilder for that field.
This set of rules around visibility lets us do amortized allocations
of a big hunk of working memory, and still comply with the familiar
small-pieces-first creation model of the NodeBuilder interface
by returning piecemeal read-only pointers into that big amortized memory hunk.
In order to satisfy these rules (namely, ensuring we never return a NodeBuilder
that addresses memory that's already been frozen) -- and do so without
consuming linearly more memory to track it! -- maps and lists end up with some
notable limitations:
- Lists can only be appended linearly, not populated in free order.
(This means we can condense the 'isFrozen' attribute to an int offset.)
- Maps can only build one new value at a time.
- Structs need no special handling -- they can still be regarded in any order.
(We know how much memory we need at compile time, so we can swallow that.)
Amusing Details and Edge Cases
------------------------------
### looped references
// who's job is it to detect this?
// the schema validator should check it...
// but something that breaks the cycle *there* doesn't necessarily do so for the emitted code! aggh!
// ... unless we go back to optional and nullable both making ptrs unconditionally.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment