1. 13 Aug, 2019 1 commit
    • Eric Myhre's avatar
      Add TypeName field to ErrWrongKind. · fe4aa0a8
      Eric Myhre authored
      In general, picking a consistent strategy with this error and sticking
      with it.  The strings were starting to accumulate too much ~stuff~,
      otherwise.
      
      It might not be a bad idea to upgrade the MethodName strings to a bunch
      of consts, but I don't think it will get any harder to do that in the
      future if we should so choose either, so, meh.
      Signed-off-by: default avatarEric Myhre <hash@exultant.us>
      fe4aa0a8
  2. 12 Aug, 2019 10 commits
  3. 11 Aug, 2019 8 commits
    • Eric Myhre's avatar
      Merge branch 'typed-node-progress' · 26564899
      Eric Myhre authored
      26564899
    • Eric Myhre's avatar
      Observation on error handling. · c20cc9f1
      Eric Myhre authored
      Leaving as a comment for now because it's not at the top of the
      priority-heap of refactors I want to chase right now.
      Signed-off-by: default avatarEric Myhre <hash@exultant.us>
      c20cc9f1
    • Eric Myhre's avatar
      Type system compat tests outline. · a0da09a0
      Eric Myhre authored
      This is an idea on how we might try to test these features, but
      there's quite a bit of work to go -- these aren't connected yet.
      
      Making the 'newNb' functions for the runtime/wrapper-based typed.Node
      implementations will be easy enough.  Making them connect for the
      codegenerated stuff might need more interesting glue, and I'm still not
      entirely sure how to go about making good tests for something that
      involves running the compiler more than once.
      Signed-off-by: default avatarEric Myhre <hash@exultant.us>
      a0da09a0
    • Eric Myhre's avatar
      Partial typed.wrapnodeStruct implementation. · 99058684
      Eric Myhre authored
      It covers the bases on reading, and has a NodeBuilder which works to
      create new nodes based on the type-level structure.
      
      (Quite a few other things about it are incomplete, and it might end
      up getting merged this way, because the goal here was primarily to
      scope out if any of our abstractions will shatter badly, and we've
      now got information on that.)
      
      This seems to indicate we are indeed on the right track, regarding
      some of those comments in the codegeneration work in the previous
      commit: check out the 'Representation' method.  Yep -- we've got
      multiple nodebuilders in our future.
      Signed-off-by: default avatarEric Myhre <hash@exultant.us>
      99058684
    • Eric Myhre's avatar
      Commit some work-in-progress on typed struct gen. · 0d349cf6
      Eric Myhre authored
      This has been on the clipboard for a while now.  When I first wrote it,
      the idea that we'd need multiple NodeBuilders per typed Node seemed a
      little staggering; now that the idea has had a while to cook, it seems
      clear and uncontrovertial, so this diff now looks like far less of a
      noodle-baker than it originally was.
      (I had been planning to switch to working on runtime (that is,
      wrapper-based, non-codegen) typed nodes for a while to validate this
      idea, and now I have some partial diffs from that to land too, but
      the validation returned "true", so now I regret having multiple irons
      in the fire... sigh!  Software development is hard.)
      Signed-off-by: default avatarEric Myhre <hash@exultant.us>
      0d349cf6
    • Eric Myhre's avatar
      Add missing accessors to schema.TypeStruct. · 9f6b4fce
      Eric Myhre authored
      Signed-off-by: default avatarEric Myhre <hash@exultant.us>
      9f6b4fce
    • Eric Myhre's avatar
      Merge branch 'just-string' · 1c36566f
      Eric Myhre authored
      1c36566f
    • Eric Myhre's avatar
      Correct a comment about performance in justString. · d0ce3ded
      Eric Myhre authored
      There's actually a doozy of performance considerations in this commit.
      Verifying (and then correcting) that comment kicked off a surprisingly
      deep and demanding binge of research.
      
      (Some of the considerations are still only "considerations",
      unfortunately -- one key discovery is that (surprise!) conclusive
      choices require more info than microbenchmarks alone can yield.)
      
      First, the big picture:
      
      One of the things we need to be really careful about throughout a
      system like go-ipld-prime (where we're dealing with large amounts of
      serialization) is the cost of garbage collection.  Since we're often
      inside of an application's "tight loop" or "hot loop" or whatever you
      prefer to call it, if we lean on the garbage collector too heavily...
      it's very, very likely to show up a system-wide impact.
      
      So, in essence, we want to call "malloc" less.  This isn't always easy.
      Sometimes it's downright impossible: we're building large tree
      structures; it's flatly impossible to do this without allocating
      some memory.
      In other cases, there are avoidable things: and in particular, one
      common undesirable source of allocations comes from "autoboxing" around
      interfaces.  (More specifically, the name of the enemy here will often
      show up on profiling reports as "runtime.convT2I".)  Sometimes this one
      can be avoided; other times, not.
      
      Now, a more detailed picture:
      
      There are actually several functions in the runtime that relate to
      memory allocation and garbage collection, and the less we use any of
      them, the better; but also, they are not all created equal.
      
      These are the functions that are of interest:
      
      - runtime.convT2I / runtime.convT2E
      - runtime.newObject
      - runtime.writeBarrier / runtime.gcWriteBarrier
      - runtime.convTstring / etc
      
      Most of these functions call `runtime.mallocgc` internally, which is
      why they're worth of note.  (writeBarrier/gcWriteBarrier are also
      noteworthy, but are different beasts.)
      
      Different kinds of implementations of something like `justString` will
      cause the generated assembly to contain calls to various combinations
      of these runtime functions when they're converted into a `Node`.
      
      These are the variations considered:
      
      - Variation 1: `type justString string`:
        results in `runtime.convTstring`.
      
      - Variation 2: `type justString struct { x string }`:
        results in `runtime.convT2I`.
      
      - Variation 3: as above, but returning `&justString{val}`:
        results in `runtime.newobject` *and* its friends
        `runtime.writeBarrier` and `runtime.gcWriteBarrier`.
      
      The actual performance of these... it depends.
      
      In microbenchmarks, I've found the above examples are roughly:
      
      - Variation 1:    23.9 ns/op     16 B/op     1 allocs/op
      - Variation 2:    31.1 ns/op     16 B/op     1 allocs/op
      - Variation 3:    23.0 ns/op     16 B/op     1 allocs/op
      
      So, a couple things about this surprised me; and a couple things I'm
      still pretty sure are just microbenchmarks being misleading.
      
      First of all: *all* of these call out to `mallocgc` internally.  And so
      we see an alloc-per-op reported in all three of them.  (This actually
      kinda bugs me, because I feel like we should be able to fit all the
      requisite information in the first case directly into the interface,
      which is, if I understand correctly, already always two words; and
      arguably, the compiler might be smart enough to do this in the second
      case as well.  But I'm wrong about this, for whatever reason, so let's
      just accept this one and move along.)  But they vary in time.  Why?
      
      Variation 2 seems to stand out as slower.  Interestingly, it turns out
      `convT2E` and `convT2I` are extra problematic because they involve a
      call of `typedmemmove` internally -- as a comment in the source says,
      there's both an allocation, a zeroing, and then a copy here (at least,
      as of go1.12); this is a big bummer.  In addition, even before getting
      that deep, if you check out the disassembly of just our functions:
      for our second variation, as inlined into our microbenchmark, there are
      9 instructions, plus 1 'CALL'; vs only 3+1 for the first variation.
      This memmove and extra instructions seems to be the explainer for why
      our second variation (`struct{string}`) is significantly (~8ns) slower.
      
      (And here I thought variation two would do well!  A struct with one
      field is the same size as the field itself; a string is one word
      of pointer; and an interface has another word for type; and that's
      our two words, so it should all pack, and on the stack!  Alas: no.)
      
      Now back up to Variation 1 (just a typedef of a string): this one
      invokes `runtime.convTstring`, and while that does invoke `mallocgc`,
      there's a detail about how that's interesting: it does it with an ask
      for a small number of bytes.  Specifically, it asks for... well,
      `unsafe.Sizeof(string)`, so that varies by platform, but it's "tiny".
      What's "tiny" mean?  `mallocgc` has a specific definition of this, and
      you can see it by grepping the runtime package source for
      "maxTinySize": it's 16 bytes.  Things under this size get special
      treatment from a "tiny allocator"; this seems to be why
      `runtime.convTstring` is relatively advantaged.
      
      (You can see benchmarks relating to this in the runtime package itself:
      try `go test -run=x -bench=Malloc runtime`.  There's a *huge* cliff
      between MallocLargeStruct versus the rest of its fellows.)
      
      Variation 3 also appears competitive.  This one surprises me, and this
      is where I still feel like microbenchmarks must be hoodwinking.
      The use of `runtime.newobject` seems to hit the same corners as
      `runtime.convTstring` at runtime in our situation here: it's "tiny";
      that's neat.  More confusingly, though, `runtime.writeBarrier` and
      `runtime.gcWriteBarrier` *should* be (potentially) very high cost
      calls.  And for some reason, they're not.  This particular workload in
      the microbenchmark must just-so-happen to tickle things in such a way
      that these calls are free (literally; within noise levels), and I
      suspect that's a happy coincidence in the benchmark that won't at all
      hold in real usage -- as any amount of real memory contention appears,
      the costs of these gc-related calls can be expected to rise.
      
      I did a few more permutations upon Variations 2 and 3, just out of
      curiosity and for future reference, adding extra fields to see if any
      interesting step functions revel themselves.  Here's what I found:
      
      - {str,int,int,int,int} is 48 bytes; &that allocs the same amount; in speed, & is faster; 33ns vs 42ns.
      - {str,int,int,int} is 48 bytes; &that allocs the same amount; in speed, & is faster; 32ns vs 42ns.
      - {str,int,int} is 32 bytes; &that allocs the same amount; in speed, & is faster; 32ns vs 39ns.
      - {str,int} is 32 bytes; &that allocs the same amount; in speed, & is faster; 31ns vs 38ns.
      - {str} is 16 bytes; &that allocs the same amount; in speed, & is faster; 24ns vs vs 32ns.
      
      Both rise in time cost as the struct grows, but the non-pointer
      variant grows faster, and it experiences a larger step of increase
      each time the size changes (which in turn steps because of alignment).
      The &{str} case is noticeably faster than the apparently linear
      progression that starts upon adding a second field; since we see the
      number 16 involved, it seems likely that this is the influence of the
      "tiny allocator" in action, and the rest of the values are linear
      relative to each other because they're all over the hump where the
      tiny allocator special path disengages.
      
      (One last note: there's also a condition about "noscan" which toggles
      the "tiny allocator", and I don't fully understand this detail.  I'd
      have thought strings might count as a pointer, which would cause our
      Variation 3 to not pass the `t.kind&kindNoPointers` check; but the
      performance cliff observation described in the previous paragraph
      seems to empirically say we're not getting kicked out by "noscan".
      (Either that or there's some yet-other phenomenon I haven't sussed.))
      
      (Okay, one even-laster note: in the course of diving around in the
      runtime malloc code, I found an interesting comment about using memory
      "from the P's arena" -- "P" being one of the letters used when talking
      about goroutine internals -- and I wonder if that contributes to our
      little mystery about how the `gcWriteBarrier` method seems so oddly
      low-cost in these microbenchmarks: perhaps per-thread arenas combined
      with lack of concurrency in the benchmark combined with quickly- and
      sequentially-freed allocations means any gcWriteBarrier is essentially
      reduced to nil.  However, this is just a theory, and I won't claim to
      totally understand the implications of this; commenting on it here
      mostly to serve as a pointer to future reading.)
      
      ---
      
      Okay.  So what comes of all this?
      
      - I have two choices: attempt to proceed further down a rabbithole of
      microbenchmarking and assembly-splunking (and next, I think, patching
      debug printfs into the compiler and runtime)... or, I can see that last
      one as a step too far for today, pull up, commit this, and return to
      this subject when there's better, less-toy usecases to test with.
      I think the latter is going to be more productive.
      
      - I'm going to use the castable variation here (Variation 1).  This
      won't always be the correct choice: it only flies here because strings
      are immutable anyway, and because it's a generic storage implementation
      rather than having any possibility of additional constraints, adjuncts
      from the schema system, validators, etc; and therefore, I don't
      actually care if it's possible to cast things directly in and out of
      this type (since doing so can't break tree immutability, and it can't
      break any of those other contracts because there aren't any).
      
      - A dev readme file appears.  It discusses what choices we might make
      for other cases in the future.  It varies by go native kind; and may
      be different for codegen'd types vs general storage implementations.
      
      - Someday, I'd like to look at this even further.  I have a persistent,
      nagging suspicion that it should be possible to make more steps in the
      direction of "zero cost abstractions" in this vicinity.  However, such
      improvements would seem to be pretty deep in the compiler and runtime.
      Someday, perhaps; but today... I started this commit in search of a
      simple diff to a comment!  Time to reel it in.
      
      Whew.
      Signed-off-by: default avatarEric Myhre <hash@exultant.us>
      d0ce3ded
  4. 08 Aug, 2019 5 commits
    • Eric Myhre's avatar
      Add a 'justString' implementation. · 040f47d2
      Eric Myhre authored
      This is meant to be about the cheapest implementation of Node possible.
      
      It's important to have this given how often we need string nodes.
      In the example of the moment: in writing some typed.Node systems, I
      need some intermediate string nodes just to nudge around map processes.
      Those nodes need to be handy and their implementation must be cheap.
      
      It's also useful that we can have a constructor which doesn't return
      errors, so it's easier to chain.  The NodeBuilder features are
      restrained to needing to be able to yield errors for interface
      satisfaction purposes, in turn needed for more complex features.
      But when building a specific node, there's no reason we need to go
      through NodeBuilder at all.
      
      I'm putting this in the ipldfree package at the moment.  I wouldn't
      mind putting a constructor for this in the root ipld package as well,
      but that would either mean a cyclic import or putting this
      implementation as a blessed one in the main package.
      (Putting this as a blessed default implementation in the main package
      is possible, and can be considered later, but is a "not today" thing;
      having these implementations in a subpackage is a good forcing-function
      for development to make sure we don't accidentally leave any important
      things unexported in the main package.)
      040f47d2
    • Eric Myhre's avatar
      Merge pull request #25 from ipld/bugs/fix-recursion-and-union-selectors · 5ace6441
      Eric Myhre authored
      Fix interaction between ExploreRecursive and ExploreUnion
      5ace6441
    • Eric Myhre's avatar
      Merge pull request #20 from ipld/feat/better-traversal · 63048dbf
      Eric Myhre authored
      More complete selector traversal
      63048dbf
    • Eric Myhre's avatar
      Merge pull request #19 from ipld/feat/selector-builder · 499d204a
      Eric Myhre authored
      Builder for selectors
      499d204a
    • Eric Myhre's avatar
      Merge pull request #26 from ipld/feat/recursive-selectors · 30cf80e2
      Eric Myhre authored
      Feat/recursive selectors
      30cf80e2
  5. 06 Aug, 2019 11 commits
  6. 05 Aug, 2019 4 commits
  7. 29 Jul, 2019 1 commit