1. 29 Jun, 2020 2 commits
  2. 26 Jun, 2020 2 commits
  3. 22 May, 2020 2 commits
    • Eric Myhre's avatar
      gendemo package is now real generation :3 · e9455cdc
      Eric Myhre authored
      Previously, it was manually written prototypes of what gen "would" look like.
      
      Now it's the real deal :3
      e9455cdc
    • Eric Myhre's avatar
      Regen the realgen package. · 0009613a
      Eric Myhre authored
      Going to move it over to replace the (currently hand-written) gendemo
      package shortly... but spread that over a few commits, in case the
      diffs turn out interesting to look at.
      0009613a
  4. 19 Apr, 2020 1 commit
    • Eric Myhre's avatar
      MapNStrMap3StrInt benchmarks on codegen. · 3b33e05c
      Eric Myhre authored
      Marshal is on par with basicnode.  Both basicnode and then gen stuff
      does a solid job of alloc amortization on reads, so the dominant cost
      remaining for both is in getting iterators.  Thus, they come out
      pretty comparable overall.
      
      Unmarshal is winning *nicely* over basicnode.  Roughly a third fewer
      allocations, and gen is about 125% faster on the clock.
      
      I haven't looked to see if unmarshal can be further improved with any
      low-hanging-fruit sorts of fixes.  Wouldn't be surprised if it can.
      
      We're gonna need more standard benchmarks... and in particular,
      need them working without the marshal/unmarshal indirections.
      Those are handy, but add a *lot* of noise from directions we're not
      necessarily interested in when looking at different node impls.
      3b33e05c
  5. 16 Apr, 2020 2 commits
    • Eric Myhre's avatar
      Remove finish callback. Much faster. Bench. · 6d31b15f
      Eric Myhre authored
      If you've been following along for a while now, you don't need to see
      the benchmarks to know what's coming.  The long story short is:
      allocations are the root of all evil, and we got rid of some, and now
      things are significantly faster.
      
      Here's the numbers:
      
      basicnode (just for a baseline to compare to):
      
      ```
      BenchmarkMapStrInt_3n_AssembleStandard-8         1988986               588 ns/op             520 B/op          8 allocs/op
      BenchmarkMapStrInt_3n_AssembleEntry-8            2158921               559 ns/op             520 B/op          8 allocs/op
      BenchmarkMapStrInt_3n_Iteration-8               19679841                67.0 ns/op            16 B/op          1 allocs/op
      BenchmarkSpec_Marshal_Map3StrInt-8               1377094               870 ns/op             544 B/op          7 allocs/op
      BenchmarkSpec_Marshal_Map3StrInt_CodecNull-8     4560031               278 ns/op             176 B/op          3 allocs/op
      BenchmarkSpec_Unmarshal_Map3StrInt-8              368763              3239 ns/op            1608 B/op         32 allocs/op
      ```
      
      realgen, previously, using fcb:
      
      ```
      BenchmarkMapStrInt_3n_AssembleStandard-8         4293072               278 ns/op             208 B/op          5 allocs/op
      BenchmarkMapStrInt_3n_AssembleEntry-8            4643892               259 ns/op             208 B/op          5 allocs/op
      BenchmarkMapStrInt_3n_Iteration-8               20307603                59.9 ns/op            16 B/op          1 allocs/op
      BenchmarkSpec_Marshal_Map3StrInt-8               1346115               913 ns/op             544 B/op          7 allocs/op
      BenchmarkSpec_Marshal_Map3StrInt_CodecNull-8     4606304               256 ns/op             176 B/op          3 allocs/op
      BenchmarkSpec_Unmarshal_Map3StrInt-8              425662              2793 ns/op            1160 B/op         27 allocs/op
      ```
      
      realgen, new, improved:
      
      ```
      BenchmarkMapStrInt_3n_AssembleStandard-8         6138765               183 ns/op             129 B/op          3 allocs/op
      BenchmarkMapStrInt_3n_AssembleEntry-8            7276795               176 ns/op             129 B/op          3 allocs/op
      BenchmarkMapStrInt_3n_Iteration-8               19593212                67.2 ns/op            16 B/op          1 allocs/op
      BenchmarkSpec_Marshal_Map3StrInt-8               1309916               912 ns/op             544 B/op          7 allocs/op
      BenchmarkSpec_Marshal_Map3StrInt_CodecNull-8     4579935               257 ns/op             176 B/op          3 allocs/op
      BenchmarkSpec_Unmarshal_Map3StrInt-8              465195              2599 ns/op            1080 B/op         25 allocs/op
      ```
      
      So!  About 150% improvement on assembly between gen with fcb and our new-improved no-callback system.
      
      And about 321% improvement in total now for codegen structs over the basicnode map.
      
      That's the kind of ratio I was looking for :)
      
      As with all of these measurements: these will also get much bigger on bigger corpuses.
      Some of the improvements here are O(n) -> O(1), and some apply even more heartily in deeper trees, etc.
      But it's telling that even on very small corpuses, the impact is already huge.
      6d31b15f
    • Eric Myhre's avatar
      Demo: codegen matching our Map3StrInt benchmark! · 2015992d
      Eric Myhre authored
      Results: mixed.
      
      The good news:
      
      The codegen works.
      
      We were able to wire it to the standard benchmarks (!  great success).
      
      It is flat out faster than any other implementation to date.
      
      The not-so-good news:
      
      It's not _as_ fast as I wanted >:(
      
      The strategy of using a callback ("fcb") for transmitting 'finished'
      signals from child assemblers to their parents causes an allocation.
      
      That single source of allocations turns out to be one of the most
      dominant things on the pprof of the benchmark.  (And it would
      absolutely be even worse if 'N' was larger than '3' -- an alloc here
      shifts us from an O(1) to O(n) on fields.)
      
      So.  Good to know!  Having end to end benchmarks is VERY exciting.
      
      And we're going to have to go back to the drawing board on that
      part involving a callback.
      2015992d