1. 16 Aug, 2021 1 commit
  2. 29 Jul, 2021 1 commit
  3. 16 Dec, 2020 1 commit
    • Daniel Martí's avatar
      all: rewrite interfaces and APIs to support int64 · f6e9a891
      Daniel Martí authored
      We only supported representing Int nodes as Go's "int" builtin type.
      This is fine on 64-bit, but on 32-bit, it limited those node values to
      just 32 bits. This is a problem in practice, because it's reasonable to
      want more than 32 bits for integers.
      
      Moreover, this meant that IPLD would change behavior if built for a
      32-bit platform; it would not be able to decode large integers, for
      example, when in fact that was just a software limitation that 64-bit
      builds did not have.
      
      To fix this problem, consistently use int64 for AsInt and AssignInt.
      
      A lot more functions are part of this rewrite as well; mainly, those
      revolving around collections and iterating. Some might never need more
      than 32 bits in practice, but consistency and portability is preferred.
      Moreover, many are interfaces, and we want IPLD interfaces to be
      flexible, which will be important for ADLs.
      
      Below are some GNU sed lines which can be used to quickly update
      function signatures to use int64:
      
      	sed -ri 's/(func.* AsInt.*)\<int\>/\1int64/g' **/*.go
      	sed -ri 's/(func.* AssignInt.*)\<int\>/\1int64/g' **/*.go
      	sed -ri 's/(func.* Length.*)\<int\>/\1int64/g' **/*.go
      	sed -ri 's/(func.* LookupByIndex.*)\<int\>/\1int64/g' **/*.go
      	sed -ri 's/(func.* Next.*)\<int\>/\1int64/g' **/*.go
      	sed -ri 's/(func.* ValuePrototype.*)\<int\>/\1int64/g' **/*.go
      
      Note that the function bodies, as well as the code that calls said
      functions, may need to be manually updated with the integer type change.
      That cannot be automated, because it's possible that an automated fix
      would silently introduce potential overflows not being handled.
      
      Some TODOs and FIXMEs for overflow checks are removed, since we remove
      some now unnecessary int64->int conversions. On the other hand, the
      older codecs based on refmt need to gain some overflow check TODOs,
      since refmt uses ints. That is okay for now, since we'll phase out refmt
      pretty soon.
      
      While at it, update codectools to use int64 for token Length fields, so
      that it properly supports full IPLD integers without machine-dependent
      behavior and overflow checks. The budget integer is also updated to be
      int64, since the lengths it uses are now int64.
      
      Note that this refactor needed changes to the Go code generator as well
      as some of the tests, for the purpose of updating all the code.
      
      Finally, note that the code-generated iterator structs do not use int64
      fields internally, even though they must return int64 numbers to
      implement the interface. This is because they use the numeric fields to
      count up to a small finite amount (such as the number of fields in a Go
      struct), or up to the length of a map/slice. Neither of them can ever
      outgrow "int".
      
      Fixes #124.
      f6e9a891
  4. 14 Nov, 2020 2 commits
    • Eric Myhre's avatar
      Add budget parameter to TokenReader. · 33fb7d98
      Eric Myhre authored
      There were already comments about how this would be "probably"
      necessary; I don't know why I wavered, it certainly is.
      33fb7d98
    • Eric Myhre's avatar
      Fresh take on codec APIs, and some tokenization utilities. · 1da7e2dd
      Eric Myhre authored
      The tokenization system may look familiar to refmt's tokens -- and
      indeed it surely is inspired by and in the same pattern -- but it
      hews a fair bit closer to the IPLD Data Model definitions of kinds,
      and it also includes links as a token kind.  Presense of link as
      a token kind means if we build codecs around these, the handling
      of links will be better and most consistently abstracted (the
      current dagjson and dagcbor implementations are instructive for what
      an odd mess it is when you have most of the tokenization happen
      before you get to the level that figures out links; I think we can
      improve on that code greatly by moving the barriers around a bit).
      
      I made both all-at-once and pumpable versions of both the token
      producers and the token consumers.  Each are useful in different
      scenarios.  The pumpable versions are probably generally a bit slower,
      but they're also more composable.  (The all-at-once versions can't
      be glued to each other; only to pumpable versions.)
      
      Some new and much reduced contracts for codecs are added,
      but not yet implemented by anything in this comment.
      The comments on them are lengthy and detail the ways I'm thinking
      that codecs should be (re)implemented in the future to maximize
      usability and performance and also allow some configurability.
      (The current interfaces "work", but irritate me a great deal every
      time I use them; to be honest, I just plain guessed badly at what
      the API here should be the first time I did it.  Configurability
      should be both easy to *not* engage in, but also easier if you do
      (and in pariticular, not require reaching to *another* library's
      packages to do it!).)  More work will be required to bring this
      to fruition.
      
      It may be particularly interesting to notice that the tokenization
      systems also allow complex keys -- maps and lists can show up as the
      keys to maps!  This is something not allowed by the data model (and
      for dare I say obvious reasons)... but it's something that's possible
      at the schema layer (e.g. structs with representation strategies that
      make them representable as strings can be used as map keys), so,
      these functions support it.
      1da7e2dd