- 16 Aug, 2021 1 commit
-
-
tavit ohanian authored
-
- 29 Jul, 2021 1 commit
-
-
tavit ohanian authored
-
- 16 Dec, 2020 1 commit
-
-
Daniel Martí authored
We only supported representing Int nodes as Go's "int" builtin type. This is fine on 64-bit, but on 32-bit, it limited those node values to just 32 bits. This is a problem in practice, because it's reasonable to want more than 32 bits for integers. Moreover, this meant that IPLD would change behavior if built for a 32-bit platform; it would not be able to decode large integers, for example, when in fact that was just a software limitation that 64-bit builds did not have. To fix this problem, consistently use int64 for AsInt and AssignInt. A lot more functions are part of this rewrite as well; mainly, those revolving around collections and iterating. Some might never need more than 32 bits in practice, but consistency and portability is preferred. Moreover, many are interfaces, and we want IPLD interfaces to be flexible, which will be important for ADLs. Below are some GNU sed lines which can be used to quickly update function signatures to use int64: sed -ri 's/(func.* AsInt.*)\<int\>/\1int64/g' **/*.go sed -ri 's/(func.* AssignInt.*)\<int\>/\1int64/g' **/*.go sed -ri 's/(func.* Length.*)\<int\>/\1int64/g' **/*.go sed -ri 's/(func.* LookupByIndex.*)\<int\>/\1int64/g' **/*.go sed -ri 's/(func.* Next.*)\<int\>/\1int64/g' **/*.go sed -ri 's/(func.* ValuePrototype.*)\<int\>/\1int64/g' **/*.go Note that the function bodies, as well as the code that calls said functions, may need to be manually updated with the integer type change. That cannot be automated, because it's possible that an automated fix would silently introduce potential overflows not being handled. Some TODOs and FIXMEs for overflow checks are removed, since we remove some now unnecessary int64->int conversions. On the other hand, the older codecs based on refmt need to gain some overflow check TODOs, since refmt uses ints. That is okay for now, since we'll phase out refmt pretty soon. While at it, update codectools to use int64 for token Length fields, so that it properly supports full IPLD integers without machine-dependent behavior and overflow checks. The budget integer is also updated to be int64, since the lengths it uses are now int64. Note that this refactor needed changes to the Go code generator as well as some of the tests, for the purpose of updating all the code. Finally, note that the code-generated iterator structs do not use int64 fields internally, even though they must return int64 numbers to implement the interface. This is because they use the numeric fields to count up to a small finite amount (such as the number of fields in a Go struct), or up to the length of a map/slice. Neither of them can ever outgrow "int". Fixes #124.
-
- 14 Nov, 2020 2 commits
-
-
Eric Myhre authored
There were already comments about how this would be "probably" necessary; I don't know why I wavered, it certainly is.
-
Eric Myhre authored
The tokenization system may look familiar to refmt's tokens -- and indeed it surely is inspired by and in the same pattern -- but it hews a fair bit closer to the IPLD Data Model definitions of kinds, and it also includes links as a token kind. Presense of link as a token kind means if we build codecs around these, the handling of links will be better and most consistently abstracted (the current dagjson and dagcbor implementations are instructive for what an odd mess it is when you have most of the tokenization happen before you get to the level that figures out links; I think we can improve on that code greatly by moving the barriers around a bit). I made both all-at-once and pumpable versions of both the token producers and the token consumers. Each are useful in different scenarios. The pumpable versions are probably generally a bit slower, but they're also more composable. (The all-at-once versions can't be glued to each other; only to pumpable versions.) Some new and much reduced contracts for codecs are added, but not yet implemented by anything in this comment. The comments on them are lengthy and detail the ways I'm thinking that codecs should be (re)implemented in the future to maximize usability and performance and also allow some configurability. (The current interfaces "work", but irritate me a great deal every time I use them; to be honest, I just plain guessed badly at what the API here should be the first time I did it. Configurability should be both easy to *not* engage in, but also easier if you do (and in pariticular, not require reaching to *another* library's packages to do it!).) More work will be required to bring this to fruition. It may be particularly interesting to notice that the tokenization systems also allow complex keys -- maps and lists can show up as the keys to maps! This is something not allowed by the data model (and for dare I say obvious reasons)... but it's something that's possible at the schema layer (e.g. structs with representation strategies that make them representable as strings can be used as map keys), so, these functions support it.
-