Commits · b832b76226edb06012ba6a856b7ae401d6ab387b · ld / go-ld-prime

08 Jun, 2021 1 commit

node/tests: cover yet more interface methods · b832b762

Daniel Martí authored Jun 07, 2021

Including LookupBySegment and LookupByNode for both maps and lists, but
also As<T> for the representation form of all scalar kinds.

While at it, expand the interface implementation assertions, too.

Note that we had to make the node/basic tests external,
for the sake of testing LookupByNode without inserting an import cycle.

b832b762

07 Jun, 2021 1 commit

node/tests: cover more error cases for scalar kinds · dab6ec7c

Daniel Martí authored Jun 04, 2021

In particular, calling non-scalar Node and Assembler methods should
always fail.

While at it, remove some unused code, since unions always have
representation strategies.

Brings up code coverage in bindnode from 72.7% to 75%.

dab6ec7c

03 Jun, 2021 2 commits

node/tests: add more extensive scalar kind tests · b6e2b10f

Daniel Martí authored Jun 03, 2021

It covers AssignKind, AssignNode, and AsKind for every combination of
assembler kind and method.

We also verify that a constructed scalar node behaves the same with
AsKind when using its representation, like the old test.

There's effectively a triple loop as a test table, so the subtest name
has up to three components separated by dashes, such as:

	TestSchema/Scalars/Bytes-AssignNode-String

We also use this test as a demo of quicktest instead of go-wish.

Finally, adapt bindnode to pass these tests just like codegen. This was
mainly a bunch of TODOs in the relevant methods.

b6e2b10f

node/bindnode: start running all schema tests · 6042d4d8

Daniel Martí authored Jun 03, 2021

We add node/tests.SchemaTestAll to simplify this task, meaning we don't
need to duplicate all test func declarations in node/bindnode.
SchemaTestAll is also flexible enough to allow running multiple
sub-tests per schema test in the future.

There were two remaining places in node/tests that still weren't using
ipld.DeepEqual, so fix those.

Finally, bindnode needed a couple of changes to fully support
ipld.DeepEqual. Most notable is iteration over maps, which required a
bit of a refactor to keep ordered keys.

6042d4d8

02 Jun, 2021 1 commit

node/tests: put most of the schema test cases here · 5b6e1d30

Daniel Martí authored Jun 01, 2021

Along with a generic Engine interface, so that they can be reused for
other ipld.Node implementations, such as bindnode.

node/bindnode will start using these in a follow-up commit, since this
one is large enough as is.

Tested that all three forms of testing schema/gen/go still work:

	go test

	CGO_ENABLED=0 go test

	go test -tags=skipgenbehavtests

5b6e1d30

25 May, 2021 1 commit

node/bindnode: start of a reflect-based Node implementation · 806e36fb

Daniel Martí authored May 18, 2021

Lots of TODOs and polishing to do, but it passes the schema/gen/go tests
with minimal changes.

Follow-up commits will continue filling in the gaps and adding
documentation, notably examples. We'll also make the tests run by
default, after a bit of refactoring.

806e36fb

09 Apr, 2021 2 commits

schema/gen/go: apply gofmt automatically · 4bb2f097

Daniel Martí authored Apr 09, 2021

Now that we buffer the output, using go/format is trivial.

This makes the default behavior better, and means not having to use an
extra gofmt go:generate step everywhere.

4bb2f097

schema/gen/go: fix remaining vet warnings on generated code · 2359e698

Daniel Martí authored Apr 09, 2021

And re-generate all code in this module.

This gets us to a point where go-codec-dagpb has zero vet warnings, for
example. schema/dmt still has a few warnings, but those are trickier to
fix, so will require another PR.

2359e698

07 Apr, 2021 1 commit

schema/gen/go: avoid Maybe pointers for small types · f3d42e04

Daniel Martí authored Apr 06, 2021

If we know that a schema type can be represented in Go with a small
amount of bytes, using a pointer to store its "maybe" is rarely a good
idea. For example, an optional string only weighs twice as much as a
pointer, so a pointer adds overhead and will barely ever save any
memory.

Add a function to work out the byte size of a schema.TypeKind, relying
on reflection and the basicnode package. Debug prints are also present
if one wants to double-check the numbers. As of today, they are:

	sizeOf(small): 32 (4x pointer size)
	sizeOf(Bool): 1
	sizeOf(Int): 8
	sizeOf(Float): 8
	sizeOf(String): 16
	sizeOf(Bytes): 24
	sizeOf(List): 24
	sizeOf(Map): 32
	sizeOf(Link): 16

Below is the result on go-merkledag's BenchmarkRoundtrip after
re-generating go-codec-dagpb with this change. Note that the dag-pb
schema contains multiple optional fields, such as strings.

	name         old time/op    new time/op    delta
	Roundtrip-8    4.24µs ± 3%    3.78µs ± 0%  -10.87%  (p=0.004 n=6+5)

	name         old alloc/op   new alloc/op   delta
	Roundtrip-8    6.38kB ± 0%    6.24kB ± 0%   -2.26%  (p=0.002 n=6+6)

	name         old allocs/op  new allocs/op  delta
	Roundtrip-8       103 ± 0%        61 ± 0%  -40.78%  (p=0.002 n=6+6)

Schema typekinds which don't directly map to basicnode prototypes, such
as structs and unions, are left as a TODO for now.

I did not do any measurements to arrive at the magic number of 4x, which
is documented in the code. We might well increase it in the future, with
more careful benchmarking. For now, it seems like a conservative starting
point that should cover all basic types.

Finally, re-generate within this repo.

f3d42e04

18 Jan, 2021 1 commit

schema/gen/go: prevent some unkeyed literal vet errors · bf0cbde7

Daniel Martí authored Jan 10, 2021

In particular, use keys for ipld error structs. These have one field, so
the changes are pretty simple.

Reduces 'go vet ./...' from 2647 lines of output to 2365.

Updates #102.

bf0cbde7

10 Jan, 2021 1 commit

schema/gen/go: please vet a bit more · 6796504d

Daniel Martí authored Jan 06, 2021

In particular, this removes ~50 out of the 2.7k warnings in 'go vet
./...' in this repository. Mainly, the "unreachable code" ones.

This was caused by edge cases in some of the generated code which caused
an unconditional return or panic statement to be followed by other code.
Fix all of them with a bit more template logic.

Some of the Next methods go a bit further. If they serve no purpose as
the switch has no cases to be matched, just unconditionally return an
error. In the future we can perhaps reuse a single function for that.

Finally, I was having a hard time actually following the logic in
kindedUnionNodeAssemblerMethodTemplateMunge, so I've indented the code a
bit to follow the template logic and scoping.

These changes move us towards pleasing vet, which is nice, but also make
the code waste a bit less space.

6796504d

03 Jan, 2021 1 commit
- gengo: support for unions with stringprefix representation. · 4da73fb9
  Eric Myhre authored Jan 03, 2021
  
  4da73fb9
31 Dec, 2020 1 commit

Revert "rename AssignNode to ConvertFrom" · f2323b3f

Eric Myhre authored Dec 31, 2020

This reverts commit 6e6625bd.

Discussed at
https://github.com/ipld/go-ipld-prime/pull/126#issuecomment-753003441

Long story short, the motivations of this rename are good,
but the new name also carries some connotations we're really not sure
about, and so we're going to undo this for now, and continue to think
about it in the future.

f2323b3f

27 Dec, 2020 1 commit
- Update a few more lingering ReprKind references. · 8fa241ea
  Eric Myhre authored Dec 27, 2020
  
  8fa241ea
25 Dec, 2020 1 commit

all: rename schema.Kind to TypeKind, ipld.ReprKind to Kind · 2d7d25c4

Daniel Martí authored Dec 17, 2020

As discussed on the issue thread, ipld.Kind and schema.TypeKind are more
intuitive, closer to the spec wording, and just generally better in the
long run.

The changes are almost entirely automated via the commands below. Very
minor changes were needed in some of the generators, and then gofmt.

	sed -ri 's/\<Kind\(\)/TypeKind()/g' **/*.go
	git checkout fluent # since it uses reflect.Value.Kind

	sed -ri 's/\<Kind_/TypeKind_/g' **/*.go
	sed -i 's/\<Kind\>/TypeKind/g' **/*.go
	sed -i 's/ReprKind/Kind/g' **/*.go

Plus manually undoing a few renames, as per Eric's review.

Fixes #94.

2d7d25c4

17 Dec, 2020 1 commit

all: rename AssignNode to ConvertFrom · 6e6625bd

Daniel Martí authored Dec 16, 2020

This should be more intuitive to Go programmers, since assignments are
generally trivial operations, but conversions imply that extra work
might be needed to adapt the value to fit in the recipient.

The entire change is just:

	sed -ri 's/AssignNode/ConvertFrom/g' **/*.go

Downstream users can very likely use the same line to fix their function
declarations and calls.

Fixes #95.

6e6625bd

16 Dec, 2020 1 commit

all: rewrite interfaces and APIs to support int64 · f6e9a891

Daniel Martí authored Dec 14, 2020

We only supported representing Int nodes as Go's "int" builtin type.
This is fine on 64-bit, but on 32-bit, it limited those node values to
just 32 bits. This is a problem in practice, because it's reasonable to
want more than 32 bits for integers.

Moreover, this meant that IPLD would change behavior if built for a
32-bit platform; it would not be able to decode large integers, for
example, when in fact that was just a software limitation that 64-bit
builds did not have.

To fix this problem, consistently use int64 for AsInt and AssignInt.

A lot more functions are part of this rewrite as well; mainly, those
revolving around collections and iterating. Some might never need more
than 32 bits in practice, but consistency and portability is preferred.
Moreover, many are interfaces, and we want IPLD interfaces to be
flexible, which will be important for ADLs.

Below are some GNU sed lines which can be used to quickly update
function signatures to use int64:

	sed -ri 's/(func.* AsInt.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* AssignInt.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* Length.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* LookupByIndex.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* Next.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* ValuePrototype.*)\<int\>/\1int64/g' **/*.go

Note that the function bodies, as well as the code that calls said
functions, may need to be manually updated with the integer type change.
That cannot be automated, because it's possible that an automated fix
would silently introduce potential overflows not being handled.

Some TODOs and FIXMEs for overflow checks are removed, since we remove
some now unnecessary int64->int conversions. On the other hand, the
older codecs based on refmt need to gain some overflow check TODOs,
since refmt uses ints. That is okay for now, since we'll phase out refmt
pretty soon.

While at it, update codectools to use int64 for token Length fields, so
that it properly supports full IPLD integers without machine-dependent
behavior and overflow checks. The budget integer is also updated to be
int64, since the lengths it uses are now int64.

Note that this refactor needed changes to the Go code generator as well
as some of the tests, for the purpose of updating all the code.

Finally, note that the code-generated iterator structs do not use int64
fields internally, even though they must return int64 numbers to
implement the interface. This is because they use the numeric fields to
count up to a small finite amount (such as the number of fields in a Go
struct), or up to the length of a map/slice. Neither of them can ever
outgrow "int".

Fixes #124.

f6e9a891

14 Dec, 2020 2 commits
- gendemo: move generate instructions to doc file; also, drop readme and use package docs. · acdc7fb2
  Eric Myhre authored Dec 14, 2020
  
  acdc7fb2
- clean up node/gendemo; generation now has standard triggers. · 7849b87d
  Eric Myhre authored Dec 13, 2020
```
Do regeneratation.  Some diff churn: the codegen doesn't yet jive
smoothly with a recent change made in service to go vet.
We'll want to straighten that out soon, I guess.
```
  7849b87d
04 Dec, 2020 2 commits

all: fix a lot of "unkeyed literal" vet warnings · 354f194f

Daniel Martí authored Dec 01, 2020

Reduces the output of 'go vet ./...' from 374 lines to 96. Many warnings
remain, but I have lost my patience for today.

Most of the changes below were automated, especially the single-line
mixins expressions. Unfortunately, many of the Traits structs required
manual copy-pasting.

354f194f

node/mixins: use simpler filenames · f5f8b0d7
Daniel Martí authored Dec 01, 2020
```
This is the mixins package, so "Mixin" in filenames is redundant.
```
f5f8b0d7

01 Dec, 2020 1 commit

node/gendemo: use the new code generator · 9fe4b32b

Daniel Martí authored Nov 30, 2020

The files in this codegen demo correspond to an older version of the Go
code generator. That's not terrible in itself, but it did make repeated
uses of 'go test' fail:

	$ go test
	testing: warning: no tests to run
	PASS
	ok  	github.com/ipld/go-ipld-prime/node/gendemo	0.112s
	$ go test
	# github.com/ipld/go-ipld-prime/node/gendemo [github.com/ipld/go-ipld-prime/node/gendemo.test]
	./minima.go:10:2: midvalue redeclared in this block
		previous declaration at ./ipldsch_minima.go:13:27
	[...]

9fe4b32b

10 Sep, 2020 1 commit

all: don't use buffers where readers suffice · 8e26c7e2

Daniel Martí authored Sep 07, 2020

Buffers are not a good option for tests if the other side expects a
reader. Otherwise, the code being tested could build assumptions around
the reader stream being a single contiguous chunk of bytes, such as:

	_ = r.(*bytes.Buffer).Bytes()

This kind of hack might seem unlikely, but it's an easy mistake to make,
especially with APIs like fmt which automatically call String methods.

With bytes.Reader and strings.Reader, the types are much more
restricted, so the tests need to be more faithful.

8e26c7e2

05 Sep, 2020 1 commit

New testcase system for exercising typed nodes; Revamp struct tests with it. · 5d732d47

Eric Myhre authored Aug 22, 2020

This new system focuses on table-driven tests, and leans heavily upon
json as a shorthand for expressing fixtures.

It also makes a great deal more effort to exercise the different
features of nodes (and their paired representation nodes) from all
directions at once for each test datum, rather than requring that all
be written out manually.

The result is that the struct tests we've renovated have a lovely
diffstat shrinkage: 111 insertions, 299 deletions...

And yet the smaller line count results in *more* coverage.

(Okay, the linecount increase for the testcase structure and helper
methods is much bigger than the savings in fixture size... but,
only *so far*.  I assume this will continue to pay off in the future.)

Relatedly: a bug in struct map representations has been fixed.
(It was the sibling of 5f589653, embarassingly.)  Thank goodness we now
get proper coverage of this area.

There's a few TODOs left to further expand the exercises, but those
can slot in easily in subsequent commits.  Same goes for further
expansion of usage of this new system.

5d732d47

30 Jul, 2020 2 commits
- ErrNoSuchField now uses PathSegment instead of string. · 174e562d
  Eric Myhre authored Jul 30, 2020
```
This will become relevant when we have structs with tuple representation!
```
  174e562d
- Fix a somewhat horrendous parity break in struct assembly between the... · 5f589653
  Eric Myhre authored Jul 30, 2020
```
Fix a somewhat horrendous parity break in struct assembly between the AssembleEntry shortcut and AssembleKey+AssembleValue.
```
  5f589653
09 Jul, 2020 1 commit

Refactor the schema.Type info to support cycles. · 47abab45

Eric Myhre authored Jul 09, 2020

This is full of mundane plonking away at adding stars and ampersands,
of which approximately none are interesting.

(I'm particularly frustrated by this because these are all placeholder
types, and we're getting *very* close to replacing them, as we get
closer and closer to self-hosting... at which point all of this bonking
about will be made totally irrelevant.  And yet to close the last mile,
these "small" fixes are surprisingly irritating.  Ah well.)

The bits that *are* interesting:

- the Spawn functions for type info now take strings rather than types
(so that they don't provoke a cycle problem for the user when
constructing the information to describe cyclic type info);

- all of the Type info structure hold a pointer to the TypeSystem, and
use that to look up reified Type info for related types, so that their
methods don't force the caller to do that themselves.

(The TypeSystem pointer was already there, amusingly; just never before
initialized, because it hadn't turned out to be load-bearing yet.)

It also would've been possible to just change all the methods on the
Type types to return TypeName rather than full Type info.  That would
avoid the need to use a TypeSystem pointer.  I didn't because:

Overall, this was done in such a way as to minimize the diff that
impacts within the templates.  This was a goal because updating
templates is a fair bit more work than other code due to the weak
compiler support.  And we'll end up reviewing and changing these
methods when we get to our goal of self-hosting generation of the
schema types from the schema-schema, so, it's not worth pushing around
diffs in that same area when they'd be sure to be churned under soon.

47abab45

29 Jun, 2020 2 commits

rename: 'Undefined' is now 'Absent'. · 75d74911
Eric Myhre authored Jun 29, 2020

75d74911

rename: 'NodeStyle' is now 'NodePrototype'. · 8da48d63

Eric Myhre authored Jun 29, 2020

Hopefully this increases clarity and eases comprehension.

Notes and discussion can be found at
https://github.com/ipld/go-ipld-prime/issues/54 (and also
I suppose in some of our weekly video chats, but I'd have
to go on quite a dig to find the relevant links and time).

Many many refernces to 'ns' are also updated to 'np',
making the line count in this diff pretty wild.

8da48d63

26 Jun, 2020 2 commits

rename: Node.Lookup{Foo} methods are now Node.LookupBy{Foo}. · ad14474f
Eric Myhre authored Jun 26, 2020
```
See the changelog for discussion; this had already been on the docket
for a while now.
```
ad14474f

rename: Node.Lookup method is now Node.LookupNode. · b6eb7572

Eric Myhre authored Jun 26, 2020

... and it will further become LookupByNode shortly, but that will be
a separate commit.

See the changelog for discussion; this had already been on the docket
for a while now.

b6eb7572

15 Jun, 2020 1 commit

Regen gendemo package. · 13a3cd38

Eric Myhre authored Jun 15, 2020

(It's a small diff -- just reflecting the fix for map returing absent.)

13a3cd38

03 Jun, 2020 1 commit
- Fix several stale references to old package names! · 1bbdc14a
  Eric Myhre authored Jun 03, 2020
  
  1bbdc14a
22 May, 2020 3 commits

gendemo package is now real generation :3 · e9455cdc

Eric Myhre authored May 22, 2020

Previously, it was manually written prototypes of what gen "would" look like.

Now it's the real deal :3

e9455cdc

Regen the realgen package. · 0009613a

Eric Myhre authored May 22, 2020

Going to move it over to replace the (currently hand-written) gendemo
package shortly... but spread that over a few commits, in case the
diffs turn out interesting to look at.

0009613a

Refresh HACKME documents. · a7e8b2f2

Eric Myhre authored May 22, 2020

Some of them still lived over the "gendemo" package, and those are now
moved over here to the proper gen package.

Linked to more of the other documents from the main HACKME doc.

a7e8b2f2

26 Apr, 2020 1 commit
- Add more test specs for list and map nesting. · 50e2df1e
  Eric Myhre authored Apr 27, 2020
  
  50e2df1e
24 Apr, 2020 1 commit

List gen support, and tests. · 0827921d

Eric Myhre authored Apr 24, 2020

These are getting increasingly straightforward as most of the hard
problems have been ironed out already when getting the other recursives
(maps and structs) sorted.

Now, we just need to stamp out the rest of the scalars, and I think
this codegen stuff is at least early-alpha level usable!

Also in this commit: added some new error types and fixed up the
basicnode.List implementation to use it, removing some todos there.

0827921d

19 Apr, 2020 1 commit

MapNStrMap3StrInt benchmarks on codegen. · 3b33e05c

Eric Myhre authored Apr 19, 2020

Marshal is on par with basicnode. Both basicnode and then gen stuff
does a solid job of alloc amortization on reads, so the dominant cost
remaining for both is in getting iterators. Thus, they come out
pretty comparable overall.

Unmarshal is winning *nicely* over basicnode. Roughly a third fewer
allocations, and gen is about 125% faster on the clock.

I haven't looked to see if unmarshal can be further improved with any
low-hanging-fruit sorts of fixes. Wouldn't be surprised if it can.

We're gonna need more standard benchmarks... and in particular,
need them working without the marshal/unmarshal indirections.
Those are handy, but add a *lot* of noise from directions we're not
necessarily interested in when looking at different node impls.

3b33e05c

16 Apr, 2020 1 commit

Remove finish callback. Much faster. Bench. · 6d31b15f

Eric Myhre authored Apr 16, 2020

If you've been following along for a while now, you don't need to see
the benchmarks to know what's coming.  The long story short is:
allocations are the root of all evil, and we got rid of some, and now
things are significantly faster.

Here's the numbers:

basicnode (just for a baseline to compare to):

```
BenchmarkMapStrInt_3n_AssembleStandard-8         1988986               588 ns/op             520 B/op          8 allocs/op
BenchmarkMapStrInt_3n_AssembleEntry-8            2158921               559 ns/op             520 B/op          8 allocs/op
BenchmarkMapStrInt_3n_Iteration-8               19679841                67.0 ns/op            16 B/op          1 allocs/op
BenchmarkSpec_Marshal_Map3StrInt-8               1377094               870 ns/op             544 B/op          7 allocs/op
BenchmarkSpec_Marshal_Map3StrInt_CodecNull-8     4560031               278 ns/op             176 B/op          3 allocs/op
BenchmarkSpec_Unmarshal_Map3StrInt-8              368763              3239 ns/op            1608 B/op         32 allocs/op
```

realgen, previously, using fcb:

```
BenchmarkMapStrInt_3n_AssembleStandard-8         4293072               278 ns/op             208 B/op          5 allocs/op
BenchmarkMapStrInt_3n_AssembleEntry-8            4643892               259 ns/op             208 B/op          5 allocs/op
BenchmarkMapStrInt_3n_Iteration-8               20307603                59.9 ns/op            16 B/op          1 allocs/op
BenchmarkSpec_Marshal_Map3StrInt-8               1346115               913 ns/op             544 B/op          7 allocs/op
BenchmarkSpec_Marshal_Map3StrInt_CodecNull-8     4606304               256 ns/op             176 B/op          3 allocs/op
BenchmarkSpec_Unmarshal_Map3StrInt-8              425662              2793 ns/op            1160 B/op         27 allocs/op
```

realgen, new, improved:

```
BenchmarkMapStrInt_3n_AssembleStandard-8         6138765               183 ns/op             129 B/op          3 allocs/op
BenchmarkMapStrInt_3n_AssembleEntry-8            7276795               176 ns/op             129 B/op          3 allocs/op
BenchmarkMapStrInt_3n_Iteration-8               19593212                67.2 ns/op            16 B/op          1 allocs/op
BenchmarkSpec_Marshal_Map3StrInt-8               1309916               912 ns/op             544 B/op          7 allocs/op
BenchmarkSpec_Marshal_Map3StrInt_CodecNull-8     4579935               257 ns/op             176 B/op          3 allocs/op
BenchmarkSpec_Unmarshal_Map3StrInt-8              465195              2599 ns/op            1080 B/op         25 allocs/op
```

So!  About 150% improvement on assembly between gen with fcb and our new-improved no-callback system.

And about 321% improvement in total now for codegen structs over the basicnode map.

That's the kind of ratio I was looking for :)

As with all of these measurements: these will also get much bigger on bigger corpuses.
Some of the improvements here are O(n) -> O(1), and some apply even more heartily in deeper trees, etc.
But it's telling that even on very small corpuses, the impact is already huge.

6d31b15f