Commits · v0.0.4 · ld / go-ld-prime

16 Aug, 2021 1 commit
- initial port · bb9b23d0
  tavit ohanian authored Aug 16, 2021
  
  bb9b23d0
29 Jul, 2021 1 commit
- initial port · d41c08d9
  tavit ohanian authored Jul 29, 2021
  
  d41c08d9
16 Dec, 2020 1 commit

all: rewrite interfaces and APIs to support int64 · f6e9a891

Daniel Martí authored Dec 14, 2020

We only supported representing Int nodes as Go's "int" builtin type.
This is fine on 64-bit, but on 32-bit, it limited those node values to
just 32 bits. This is a problem in practice, because it's reasonable to
want more than 32 bits for integers.

Moreover, this meant that IPLD would change behavior if built for a
32-bit platform; it would not be able to decode large integers, for
example, when in fact that was just a software limitation that 64-bit
builds did not have.

To fix this problem, consistently use int64 for AsInt and AssignInt.

A lot more functions are part of this rewrite as well; mainly, those
revolving around collections and iterating. Some might never need more
than 32 bits in practice, but consistency and portability is preferred.
Moreover, many are interfaces, and we want IPLD interfaces to be
flexible, which will be important for ADLs.

Below are some GNU sed lines which can be used to quickly update
function signatures to use int64:

	sed -ri 's/(func.* AsInt.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* AssignInt.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* Length.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* LookupByIndex.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* Next.*)\<int\>/\1int64/g' **/*.go
	sed -ri 's/(func.* ValuePrototype.*)\<int\>/\1int64/g' **/*.go

Note that the function bodies, as well as the code that calls said
functions, may need to be manually updated with the integer type change.
That cannot be automated, because it's possible that an automated fix
would silently introduce potential overflows not being handled.

Some TODOs and FIXMEs for overflow checks are removed, since we remove
some now unnecessary int64->int conversions. On the other hand, the
older codecs based on refmt need to gain some overflow check TODOs,
since refmt uses ints. That is okay for now, since we'll phase out refmt
pretty soon.

While at it, update codectools to use int64 for token Length fields, so
that it properly supports full IPLD integers without machine-dependent
behavior and overflow checks. The budget integer is also updated to be
int64, since the lengths it uses are now int64.

Note that this refactor needed changes to the Go code generator as well
as some of the tests, for the purpose of updating all the code.

Finally, note that the code-generated iterator structs do not use int64
fields internally, even though they must return int64 numbers to
implement the interface. This is because they use the numeric fields to
count up to a small finite amount (such as the number of fields in a Go
struct), or up to the length of a map/slice. Neither of them can ever
outgrow "int".

Fixes #124.

f6e9a891

14 Nov, 2020 2 commits

Add budget parameter to TokenReader. · 33fb7d98

Eric Myhre authored Nov 11, 2020

There were already comments about how this would be "probably"
necessary; I don't know why I wavered, it certainly is.

33fb7d98

Fresh take on codec APIs, and some tokenization utilities. · 1da7e2dd

Eric Myhre authored Oct 20, 2020

The tokenization system may look familiar to refmt's tokens -- and
indeed it surely is inspired by and in the same pattern -- but it
hews a fair bit closer to the IPLD Data Model definitions of kinds,
and it also includes links as a token kind.  Presense of link as
a token kind means if we build codecs around these, the handling
of links will be better and most consistently abstracted (the
current dagjson and dagcbor implementations are instructive for what
an odd mess it is when you have most of the tokenization happen
before you get to the level that figures out links; I think we can
improve on that code greatly by moving the barriers around a bit).

I made both all-at-once and pumpable versions of both the token
producers and the token consumers.  Each are useful in different
scenarios.  The pumpable versions are probably generally a bit slower,
but they're also more composable.  (The all-at-once versions can't
be glued to each other; only to pumpable versions.)

Some new and much reduced contracts for codecs are added,
but not yet implemented by anything in this comment.
The comments on them are lengthy and detail the ways I'm thinking
that codecs should be (re)implemented in the future to maximize
usability and performance and also allow some configurability.
(The current interfaces "work", but irritate me a great deal every
time I use them; to be honest, I just plain guessed badly at what
the API here should be the first time I did it.  Configurability
should be both easy to *not* engage in, but also easier if you do
(and in pariticular, not require reaching to *another* library's
packages to do it!).)  More work will be required to bring this
to fruition.

It may be particularly interesting to notice that the tokenization
systems also allow complex keys -- maps and lists can show up as the
keys to maps!  This is something not allowed by the data model (and
for dare I say obvious reasons)... but it's something that's possible
at the schema layer (e.g. structs with representation strategies that
make them representable as strings can be used as map keys), so,
these functions support it.

1da7e2dd