Commits · 1bfa8fb30b68cc28c8b58a07e384f7f695deb250 · ld / go-ld-prime

26 Apr, 2021 1 commit
- Allow emitting & parsing of bytes per dagjson codec spec · 1bfa8fb3
  Will Scott authored Apr 26, 2021
```
fix #165
```
  1bfa8fb3
25 Apr, 2021 2 commits

Eric Myhre authored Apr 26, 2021

These are somewhat overdue, and clarify what features are supported,
and also note some discrepancies in implementation versus the spec.

(As I'm taking this inventory of discrepancies, there's admittedly
rather more than I'd like... but step 1: document the current truth.
Prioritizing which things to hack on, in the field of infinite possible
prioritizations of things that need hacking on, can be a step 2.)

92c695e9

Update package docs. · e53a83e8

Eric Myhre authored Apr 25, 2021

These had drifted a bit, and I just noticed it's due for some freshening up.

e53a83e8

19 Apr, 2021 1 commit
- Merge pull request #163 from mvdan/gen-go-gofmt · 246262ea
  Eric Myhre authored Apr 19, 2021
```
schema/gen/go: apply gofmt automatically
```
  246262ea
09 Apr, 2021 4 commits

schema/gen/go: apply gofmt automatically · 4bb2f097

Daniel Martí authored Apr 09, 2021

Now that we buffer the output, using go/format is trivial.

This makes the default behavior better, and means not having to use an
extra gofmt go:generate step everywhere.

4bb2f097

schema/gen/go: fix remaining vet warnings on generated code · 2359e698

Daniel Martí authored Apr 09, 2021

And re-generate all code in this module.

This gets us to a point where go-codec-dagpb has zero vet warnings, for
example. schema/dmt still has a few warnings, but those are trickier to
fix, so will require another PR.

2359e698

Merge pull request #161 from mvdan/generate-buffer · 7bca6b48
Eric Myhre authored Apr 09, 2021
```
schema/gen/go: batch file writes via a bytes.Buffer
```
7bca6b48

schema/gen/go: batch file writes via a bytes.Buffer · 88f72f58

Daniel Martí authored Apr 09, 2021

With this change, running 'go generate ./...' on the entire module while
running gopls on one of its files drops gopls's CPU spinning from ~25s
to well under a second. They should improve that anyway, but there's no
reason for the tens of thousands of tiny FS writes on our end either.

The time to run 'go generate ./...' itself is largely unaffected; it
goes from ~1.2s to ~1.1s, judging by a handful of runs.

88f72f58

07 Apr, 2021 1 commit

schema/gen/go: avoid Maybe pointers for small types · f3d42e04

Daniel Martí authored Apr 06, 2021

If we know that a schema type can be represented in Go with a small
amount of bytes, using a pointer to store its "maybe" is rarely a good
idea. For example, an optional string only weighs twice as much as a
pointer, so a pointer adds overhead and will barely ever save any
memory.

Add a function to work out the byte size of a schema.TypeKind, relying
on reflection and the basicnode package. Debug prints are also present
if one wants to double-check the numbers. As of today, they are:

	sizeOf(small): 32 (4x pointer size)
	sizeOf(Bool): 1
	sizeOf(Int): 8
	sizeOf(Float): 8
	sizeOf(String): 16
	sizeOf(Bytes): 24
	sizeOf(List): 24
	sizeOf(Map): 32
	sizeOf(Link): 16

Below is the result on go-merkledag's BenchmarkRoundtrip after
re-generating go-codec-dagpb with this change. Note that the dag-pb
schema contains multiple optional fields, such as strings.

	name         old time/op    new time/op    delta
	Roundtrip-8    4.24µs ± 3%    3.78µs ± 0%  -10.87%  (p=0.004 n=6+5)

	name         old alloc/op   new alloc/op   delta
	Roundtrip-8    6.38kB ± 0%    6.24kB ± 0%   -2.26%  (p=0.002 n=6+6)

	name         old allocs/op  new allocs/op  delta
	Roundtrip-8       103 ± 0%        61 ± 0%  -40.78%  (p=0.002 n=6+6)

Schema typekinds which don't directly map to basicnode prototypes, such
as structs and unions, are left as a TODO for now.

I did not do any measurements to arrive at the magic number of 4x, which
is documented in the code. We might well increase it in the future, with
more careful benchmarking. For now, it seems like a conservative starting
point that should cover all basic types.

Finally, re-generate within this repo.

f3d42e04

03 Apr, 2021 1 commit
- fix readme formatting typo · fc2a58f3
  Eric Myhre authored Apr 03, 2021
  
  fc2a58f3
02 Apr, 2021 1 commit
- Merge pull request #158 from ipld/feat/add-reification-to-link-system · 74065785
  Eric Myhre authored Apr 02, 2021
```
feat(linksystem): add reification to LinkSystem
```
  74065785
01 Apr, 2021 1 commit

feat(linksystem): add reification to LinkSystem · 9340af55

hannahhoward authored Mar 31, 2021

add an optional reifier into the link system process. The reason to do this is to capture the link
system itself when reification happens, in case it needs to be put into the node, which is often not
accessible by the time you have a node. another alternative would be to make this specific to
selector traversal, similar to LinkNodePrototypeChooser

9340af55

24 Mar, 2021 2 commits
- Merge pull request #149 from ipld/feat/configure-hash-on-load · dc342a99
  Eric Myhre authored Mar 24, 2021
```
Add option to tell link system storage is trusted and we can skip hash on read
```
  dc342a99
- feat(linksystem): add option to disable hash on read · 6a262a3c
  hannahhoward authored Mar 04, 2021
```
Have boolean that specifies whether the storage is trusted. if it is, skip hashing on reads
```
  6a262a3c
23 Mar, 2021 5 commits
- Merge pull request #153 from ipld/codec/cbor · 678a428b
  Eric Myhre authored Mar 23, 2021
```
implement non-dag cbor codec
```
  678a428b
- implement non-dag cbor codec · 6e55e9be
  Will Scott authored Mar 23, 2021
  
  6e55e9be
- Merge pull request #152 from ipld/codec/json · f02df08b
  Eric Myhre authored Mar 23, 2021
```
add non-dag json codec
```
  f02df08b
- json Encode fails on links · 4f959282
  Will Scott authored Mar 22, 2021
  
  4f959282
- json as separate codec · 5ab5068c
  Will Scott authored Mar 22, 2021
  
  5ab5068c
22 Mar, 2021 1 commit
- add non-dag json codec · 6c9c90d1
  Will Scott authored Mar 22, 2021
  
  6c9c90d1
15 Mar, 2021 5 commits
- typo fixes · 94cf448d
  Eric Myhre authored Mar 15, 2021
```
:'(
```
  94cf448d
- mark v0.9.0 · a4e922a5
  Eric Myhre authored Mar 15, 2021
  
  a4e922a5
- Changelog: more backfill :) · 7fc91fed
  Eric Myhre authored Mar 15, 2021
  
  7fc91fed
- hackme: about merge strategies. · 00affde8
  Eric Myhre authored Mar 15, 2021
```
This is just documenting our defacto policies already.
```
  00affde8
- Dropping .gopath and other unmaintained scripts. · a9147103
  Eric Myhre authored Mar 15, 2021
```
Unless we can keep these up to date with go-mod state automatically
(and we don't currently have tools for that), maintaining both these
systems is problematic.
```
  a9147103
12 Mar, 2021 9 commits

Merge pull request #143 from ipld/linksystem · f6f71240
Eric Myhre authored Mar 12, 2021
```
introduce LinkSystem
```
f6f71240
Update changelog. · 8ef5eabf
Eric Myhre authored Mar 12, 2021

8ef5eabf

Better document why some of the branches around direct byte slice access matter in LinkSystem. · d9d56825

Eric Myhre authored Mar 12, 2021

(It can be hard to intuit this just by reading the code, because some
of the key relevance is actually in *other* functions that might not
be in this repo at all!)

d9d56825

Update some comments regarding multihash. · 6386588d
Eric Myhre authored Mar 12, 2021
```
(Some things got better.  Several others in the area still have not.)
```
6386588d
fix(codec/raw): update for linksystem · 8a7497fe
hannahhoward authored Mar 11, 2021

8a7497fe

Merge branch 'master' into linksystem · 705307f1

Eric Myhre authored Mar 12, 2021

Resolves conflicts in go.sum.
(They were conflicts in a textual sense only, but semantically trivial;
application of `go mod tidy` was sufficient.)

705307f1

Multicodec registry now guarded by functions. · 8fcc6767
Eric Myhre authored Mar 12, 2021

8fcc6767

LinkSystem: detect direct byte access features on readers and attempt to use it. · 08f13e56

Eric Myhre authored Mar 12, 2021

I'm still not fully convinced this is always a win for performance,
but as discussed in
https://github.com/ipld/go-ipld-prime/pull/143#discussion_r582335774 ,
there was past evidence, and reproducing that takes work -- so, we're
going to try to be relatively conservative here and keep this logical
branch behavior in place for now.

(The reason this may be worth re-examining is that the change to
hashing interface probably gets rid of one big source of copies.
That doens't answer the holistic performance question alone, though;
it just reopens it.)

08f13e56

Update go-multihash; this upstreams much of the streaming hash work. · 45b8e9c8
Eric Myhre authored Mar 12, 2021

45b8e9c8

10 Mar, 2021 1 commit

Readme updates. · 77d3dd2b

Eric Myhre authored Mar 10, 2021

Updated the phrasing on relationship to alterative golang libraries.
The state of play there is a bit more advanced than the last time I
updated this readme.

Was also able to add quite a few more statements on distinguishing
features vs alternative libraries.

More discussion of version strategy.  It's defacto what I've been doing
already; now it is documented.

77d3dd2b

05 Mar, 2021 1 commit

codec/raw: implement the raw codec · 7e692244

Daniel Martí authored Mar 05, 2021

It's small, it's simple, and it's already widely used as part of unixfs.
So there's no reason it shouldn't be part of go-ipld-prime.

The codec is tiny, but has three noteworthy parts: the Encode and Decode
funcs, the cidlink multicodec registration, and the Bytes method
shortcut. Each of these has its own dedicated regression test.

I'm also using this commit to showcase the use of quicktest instead of
go-wish. The result is extremely similar, but with less dot-import
magic. For example, if I remove the Bytes shortcut in Decode:

	--- FAIL: TestDecodeBuffer (0.00s)
	    codec_test.go:115:
	        error:
	          got non-nil error
	        got:
	          e"could not decode raw node: must not call Read"
	        stack:
	          /home/mvdan/src/ipld/codec/raw/codec_test.go:115
	            qt.Assert(t, err, qt.IsNil)

7e692244

25 Feb, 2021 4 commits

Extract multi{codec,hash} registries better. · 8fef5312

Eric Myhre authored Feb 25, 2021

And, make a package which can be imported to register "all" of the
multihashes.  (Or at least all of them that you would've expected
from go-multihash.)

There are also packages that are split roughly per the transitive
dependency it brings in, so you can pick and choose.

This cascaded into more work than I might've expected.
Turns out a handful of the things we have multihash identifiers for
actually *do not* implement the standard hash.Hash contract at all.
For these, I've made small shims.

Test fixtures across the library switch to using sha2-512.
Previously I had written a bunch of them to use sha3 variants,
but since that is not in the standard library, I'm going to move away
from that so as not to re-bloat the transitive dependency tree
just for the tests and examples.

8fef5312

Introduce LinkSystem. · a1482fe2

Eric Myhre authored Feb 04, 2021

This significantly reworks how linking is handled.

All of the significant operations involved in storing and loading
data are extracted into their own separate features, and the LinkSystem
just composes them.  The big advantage of this is we can now add as
many helper methods to the LinkSystem construct as we want -- whereas
previously, adding methods to the Link interface was a difficult
thing to do, because that interface shows up in a lot of places.

Link is now *just* treated as a data holder -- it doesn't need logic
attached to it directly.  This is much cleaner.

The way we interact with the CID libraries is also different.
We're doing multihash registries ourselves, and breaking our direct
use of the go-multihash library.  The big upside is we're now using
the familiar and standard hash.Hash interface from the golang stdlib.
(And as a bonus, that actually works streamingly; go-mulithash didn't.)
However, this also implies a really big change for downstream users:
we're no longer baking as many hashes into the new multihash registry
by default.

a1482fe2

Merge-ignore branch 'schema-dmt-unification' · 7d918468

Eric Myhre authored Feb 25, 2021

This commit is a merge commit, but using the "ours" strategy -- meaning
that the contents are functionally ignored and have no impact on the
branch they're being merged into.

Something went off the rails at some point here, and I'm not sure what,
but I'm calling it quits and this work will have to be rebooted in
a fresh set of patches sometime in the future.

The "hackme" file discussing the choices: probably salvagable.
It's not perfect (some sections read harsher than others; should redo
it with a table that applies the same checklist to each option),
and obviously the final conclusion it came to is questionable, but
most of the discussion is probably good.

The "carrier types" sidequest: extremely questionable.
The micro-codegenerator I made for that lives on in its own repo:
see https://github.com/warpfork/go-quickimmut .  But overall,
I don't think this went well.  With these, we got immutability,
and we got it without an import cycle or any direct dependence
on the IPLD Schema codegen output, but... we lost other things,
like the ability to differentiate empty lists from nil lists,
and it drops map ordering again, etc.  All of these are problems
solved by the IPLD Schema codegen, and losing those features again
was just... painful.  And the more places we're stuck flipping
between various semantics for representing "Maybe", the vastly
more likely it becomes to be farming bugs; this approach hit that.
Tracking "Is this nil or empty at this phase of its life" got
very confusing and is one of the main reasons I'm feeling it's
probably wise to put this whole thing down.

The first time I re-wrote a custom order-preservation feature,
I was like "ugh, golang, but okay, fine".  The second and third
and fourth times I ran into it, and realized not doing the work
again would result in randomized error message orders and muck up
my attempts at unit tests for error paths... I was less happy.
I still don't know what the solution to this is, other than trying
to use the IPLD Schema codegen in a cyclic way, because it *does*
solve these problems.

All of the transformation code in the dmt package which flipped
things into the compiler structures: awful.
To be fair, I expected that to be awful.  But it was really awful.

The "carrier types" stuff all being attached to a Compiler type,
for sheer grouping: extremely questionable.
If we were going to have this architecture of types|compiler|dmt
overall, I'd hands down absolutely full-stop definitely no-question
want the compiler to just be its own package.  Unfortunately,
as you already know, golang: if we want the types metadata types to
be immutable, all the code for building them has to be in-package.
Any form of namespacing at all would be an adequate substitute for
a full-on package boundary.  Possibly even just a feature that allowed
some things in a package to be grouped together in the *documentation*
would be satiating!

The "compiler_rules.go" file: really quite good!
Not entirely certain of the "first rule in a set to flunk causes
short-circuit" tactic, because there's at least one case where that
resulted in under-informative error messages that could've reasonably
identified two issues in one pass if not for the short-circuit.
But otherwise, the table-driven approach seemed promising.

Holistically: it seemed like having a "compiler" system that was
separate from the "dmt" data holder types would be a nice separation
of concerns.  When actually writing it: no, it was not.  The compiler
system ended up needing to pass through almost the exact range of
semantics that the dmt expresses (whether good or bad), so that when
the validator system was built on the compiler's data types, it could
provide reasonable responses to any issues in the data that might've
originated from the dmt format (and in turn this ultimately matters
for end-user experience).  With that degree of coupling, the compiler
system ends up forced to have a very limited and sharp-edged API
that's not at all natural, and certainly doesn't add any value.
It's possible this is inevitable (we didn't start this quest in order
to get a nice golang API, we started it to get rid of a dang import
cycle!), but it definitely generated pain.

I'm frankly not sure what the path forward here is.  Having the
validation logic attach to the dmt would solve the coupling pressure;
but the tradeoff would be there's no validation logic on the
constructors which produce the in-memory type info!  Is that okay?
I dunno.  It would definitely make me grit my teeth, but maybe it's one
of the least bad options left, seeing as how many other angles of
attack seem to have turned pretty sour now.
Another angle that deserves more thought is cyclebreaking by removing
self-description from generated types (this gets a mention in the
hackme doc in the diff already, but perhaps isn't studied far enough).

Whatever it is: it's going to start as a new body of work.
This well here is dry.

Also documented and discussed in the web at
https://github.com/ipld/go-ipld-prime/pull/144 .

7d918468

schema compiler: last gasp of attempting this refactor. · f0f5a630

Eric Myhre authored Feb 25, 2021

I'm about to call it quits on this.  I'm not sure exactly where this
got off the rails, but I'm not happy about how its going, and
with this diff, I've reached enough "huh,hmm" moments that I think
it's going to end up being less work restarting on a cleaner approach
than it's going to be work finishing this, fixing all the bugs
resulting from the mess of maybeism, and then maintaining it.

Comments in the diff body show the exact moment of my exasperation
reaching a critical threshhold.

I'm really not happy with the golang typesystem today.

A more systematic review of this stack of diffs will follow in the
subsequent commit message.  It will be a merge-ignore commit.

f0f5a630