Commits · v0.0.3 · ld / go-ld-prime

16 Aug, 2021 1 commit
- initial port · bb9b23d0
  tavit ohanian authored Aug 16, 2021
  
  bb9b23d0
29 Jul, 2021 1 commit
- initial port · d41c08d9
  tavit ohanian authored Jul 29, 2021
  
  d41c08d9
25 Dec, 2020 1 commit

all: rename schema.Kind to TypeKind, ipld.ReprKind to Kind · 2d7d25c4

Daniel Martí authored Dec 17, 2020

As discussed on the issue thread, ipld.Kind and schema.TypeKind are more
intuitive, closer to the spec wording, and just generally better in the
long run.

The changes are almost entirely automated via the commands below. Very
minor changes were needed in some of the generators, and then gofmt.

	sed -ri 's/\<Kind\(\)/TypeKind()/g' **/*.go
	git checkout fluent # since it uses reflect.Value.Kind

	sed -ri 's/\<Kind_/TypeKind_/g' **/*.go
	sed -i 's/\<Kind\>/TypeKind/g' **/*.go
	sed -i 's/ReprKind/Kind/g' **/*.go

Plus manually undoing a few renames, as per Eric's review.

Fixes #94.

2d7d25c4

16 Dec, 2020 1 commit

all: rewrite interfaces and APIs to support int64 · f6e9a891

Daniel Martí authored Dec 14, 2020

We only supported representing Int nodes as Go's "int" builtin type.
This is fine on 64-bit, but on 32-bit, it limited those node values to
just 32 bits. This is a problem in practice, because it's reasonable to
want more than 32 bits for integers.

Moreover, this meant that IPLD would change behavior if built for a
32-bit platform; it would not be able to decode large integers, for
example, when in fact that was just a software limitation that 64-bit
builds did not have.

To fix this problem, consistently use int64 for AsInt and AssignInt.

A lot more functions are part of this rewrite as well; mainly, those
revolving around collections and iterating. Some might never need more
than 32 bits in practice, but consistency and portability is preferred.
Moreover, many are interfaces, and we want IPLD interfaces to be
flexible, which will be important for ADLs.

Below are some GNU sed lines which can be used to quickly update
function signatures to use int64:

	sed -ri 's/(func.* AsInt.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* AssignInt.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* Length.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* LookupByIndex.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* Next.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* ValuePrototype.*)\<int\>/\1int64/g' **/*.go

Note that the function bodies, as well as the code that calls said
functions, may need to be manually updated with the integer type change.
That cannot be automated, because it's possible that an automated fix
would silently introduce potential overflows not being handled.

Some TODOs and FIXMEs for overflow checks are removed, since we remove
some now unnecessary int64->int conversions. On the other hand, the
older codecs based on refmt need to gain some overflow check TODOs,
since refmt uses ints. That is okay for now, since we'll phase out refmt
pretty soon.

While at it, update codectools to use int64 for token Length fields, so
that it properly supports full IPLD integers without machine-dependent
behavior and overflow checks. The budget integer is also updated to be
int64, since the lengths it uses are now int64.

Note that this refactor needed changes to the Go code generator as well
as some of the tests, for the purpose of updating all the code.

Finally, note that the code-generated iterator structs do not use int64
fields internally, even though they must return int64 numbers to
implement the interface. This is because they use the numeric fields to
count up to a small finite amount (such as the number of fields in a Go
struct), or up to the length of a map/slice. Neither of them can ever
outgrow "int".

Fixes #124.

f6e9a891

01 Dec, 2020 2 commits

Tweak to alloc counting tests. · ca680715
Eric Myhre authored Nov 14, 2020
```
I dearly wish this wasn't such a dark art.
But I really want these tests, too.
```
ca680715

Add scratch.Reader tool, helpful for decoders. · 3040f082

Eric Myhre authored Nov 14, 2020

The docs in the diff should cover it pretty well.
It's a reader-wrapper that does a lot of extremely common
buffering and small-read operations that parsers tend to need.

This emerges from some older generation of code in refmt with similar purpose:
https://github.com/polydawn/refmt/blob/master/shared/reader.go
Unlike those antecedents, this one is a single concrete implementation,
rather than using interfaces to allow switching between the two major modes of use.
This is surely uglier code, but I think the result is more optimizable.

The tests include aggressive checks that operations take exactly as
many allocations as planned -- and mostly, that's *zero*.

In the next couple of commits, I'll be adding parsers which use this.

Benchmarks are still forthcoming.  My recollection from the previous
bout of this in refmt was that microbenchmarking this type wasn't
a great use of time, because when we start benchmarking codecs built
*upon* it, and especially, when looking at the pprof reports from that,
we'll see this reader showing up plain as day there, and nicely
contextualized... so, we'll just save our efforts for that point.

3040f082

14 Nov, 2020 6 commits

Add position tracking fields to Token. · 1110155d

Eric Myhre authored Nov 11, 2020

These aren't excersied yet -- and this is accordingly still highly
subject to change -- but so far in developing this package, the pattern
has been "if I say maybe this should have X", it's always turned out
it indeed should have X. So let's just do that and then try it out,
and have the experimental code instead of the comments.

1110155d

Token.Normalize utility method. · a8995f6f

Eric Myhre authored Nov 11, 2020

Useful for tests that do deep equality tests on structures.

Same caveat about current placement of this method as in the previous
commit: this might be worth detaching and shifting to a 'codectest'
or 'tokentest' package. But let's see how it shakes out.

a8995f6f

Extract and export StringifyTokenSequence utility. · d3511334

Eric Myhre authored Nov 11, 2020

This is far too useful in testing to reproduce in each package that
needs something like it.  It's already shown up as desirable again
as soon as I start implementing even a little bit of even one codec
tokenizer, and that's gonna keep happening.

This might be worth moving to some kind of a 'tokentest' or
'codectest' package instead of cluttering up this one, but...
we'll see; I've got a fair amount more code to flush into commits,
and after that we can reshake things and see if packages settle
out differently.

d3511334

Add budget parameter to TokenReader. · 33fb7d98

Eric Myhre authored Nov 11, 2020

There were already comments about how this would be "probably"
necessary; I don't know why I wavered, it certainly is.

33fb7d98

Type the TokenKind consts correctly. · 72793f26

Eric Myhre authored Nov 11, 2020

You can write a surprising amount of code where the compiler will shrug
and silently coerce things for you.  Right up until you can't.
(Some test cases that'll be coming down the commit queue shortly
happened to end up checking the type of the constants, and, well.
Suddenly this was noticable.)

72793f26

Fresh take on codec APIs, and some tokenization utilities. · 1da7e2dd

Eric Myhre authored Oct 20, 2020

The tokenization system may look familiar to refmt's tokens -- and
indeed it surely is inspired by and in the same pattern -- but it
hews a fair bit closer to the IPLD Data Model definitions of kinds,
and it also includes links as a token kind.  Presense of link as
a token kind means if we build codecs around these, the handling
of links will be better and most consistently abstracted (the
current dagjson and dagcbor implementations are instructive for what
an odd mess it is when you have most of the tokenization happen
before you get to the level that figures out links; I think we can
improve on that code greatly by moving the barriers around a bit).

I made both all-at-once and pumpable versions of both the token
producers and the token consumers.  Each are useful in different
scenarios.  The pumpable versions are probably generally a bit slower,
but they're also more composable.  (The all-at-once versions can't
be glued to each other; only to pumpable versions.)

Some new and much reduced contracts for codecs are added,
but not yet implemented by anything in this comment.
The comments on them are lengthy and detail the ways I'm thinking
that codecs should be (re)implemented in the future to maximize
usability and performance and also allow some configurability.
(The current interfaces "work", but irritate me a great deal every
time I use them; to be honest, I just plain guessed badly at what
the API here should be the first time I did it.  Configurability
should be both easy to *not* engage in, but also easier if you do
(and in pariticular, not require reaching to *another* library's
packages to do it!).)  More work will be required to bring this
to fruition.

It may be particularly interesting to notice that the tokenization
systems also allow complex keys -- maps and lists can show up as the
keys to maps!  This is something not allowed by the data model (and
for dare I say obvious reasons)... but it's something that's possible
at the schema layer (e.g. structs with representation strategies that
make them representable as strings can be used as map keys), so,
these functions support it.

1da7e2dd