1. 30 Aug, 2021 2 commits
  2. 24 Aug, 2021 2 commits
  3. 23 Aug, 2021 2 commits
  4. 16 Aug, 2021 1 commit
  5. 14 Aug, 2021 1 commit
  6. 11 Aug, 2021 2 commits
    • Masih H. Derkani's avatar
      Document performance caveats of `ExtractV1File` and address comments · c514a301
      Masih H. Derkani authored
      Since `CopyFileRange` performance is OS-dependant, we cannot guarantee
      that `ExtractV1File` will keep copies out of user-space. For example, on
      Linux with sufficiently old Kernel the current behaviour will fall back
      to user-space copy. Document this on the function so that it is made
      clear.
      
      Improve benchmarking determinism and error messages.
      
      The benchmark numbers that Daniel obtained on his laptop,
      running Linux 5.13 on an i5-8350U with /tmp being tmpfs,
      are as follows.
      The "old" results are BenchmarkExtractV1UsingReader,
      and the "new" are BenchmarkExtractV1File.
      
      	name         old time/op    new time/op    delta
      	ExtractV1-8    1.33ms ± 1%    1.11ms ± 2%  -16.48%  (p=0.000 n=8+8)
      
      	name         old speed      new speed      delta
      	ExtractV1-8  7.88GB/s ± 1%  9.43GB/s ± 2%  +19.74%  (p=0.000 n=8+8)
      
      	name         old alloc/op   new alloc/op   delta
      	ExtractV1-8    34.0kB ± 0%     1.0kB ± 0%  -97.09%  (p=0.000 n=8+8)
      
      	name         old allocs/op  new allocs/op  delta
      	ExtractV1-8      26.0 ± 0%      23.0 ± 0%  -11.54%  (p=0.000 n=8+8)
      
      So, at least in the case where the filesystem is very fast,
      we can see that the benefit is around 10-20%,
      as well as fewer allocs thanks to not needing a user-space buffer.
      The performance benefit will likely be smaller on slower disks.
      
      For the Linux syscall logic, see:
      - https://cs.opensource.google/go/go/+/refs/tags/go1.16.7:src/internal/poll/copy_file_range_linux.go;drc=refs%2Ftags%2Fgo1.16.7;l=54
      c514a301
    • Masih H. Derkani's avatar
      Implement utility to extract CARv1 from a CARv2 · 81137942
      Masih H. Derkani authored
      Implement `ExtractV1File` where the function takes path to a CARv2 file
      and efficiently extracts its inner CARv1 payload. Note, the
      implementation only supports CARv2 as input and returns a dedicated
      error if the supplied input is already in CARv1 format.
      
      Implement benchmarks comparing extraction using `Reader` vs
      `ExtractV1File`.
      
      Implement tests that assert in-place extraction as well as invalid input
      and both v1/v2 input
      
      Fixes #207
      81137942
  7. 10 Aug, 2021 1 commit
    • Daniel Martí's avatar
      v2/blockstore: add ReadWrite.Discard · 039ddc7c
      Daniel Martí authored
      This allows closing a read-write blockstore without doing the extra work
      to finalize its header and index. Can be useful if an entire piece of
      work is cancelled, and also simplifies the tests.
      
      Also make ReadOnly error in a straightforward way if it is used after
      being closed. Before, this could lead to panics, as we'd attempt to read
      the CARv1 file when it's closed.
      
      Both mechanisms now use a "closed" boolean, which is consistent and
      simpler than checking a header field.
      
      Finally, add tests that ensure both ReadOnly and ReadWrite behave as
      intended once they have been closed.
      
      The tests also uncovered that AllKeysChan would not release the mutex if
      it encountered an error early on. Fix that, too.
      
      While at it, fix some now-obsolete references to panics on unsupported
      or after-close method calls.
      
      Fixes #205.
      039ddc7c
  8. 06 Aug, 2021 1 commit
  9. 05 Aug, 2021 3 commits
    • Daniel Martí's avatar
      re-add root LICENSE file · 2c0f1eee
      Daniel Martí authored
      The last commit did a mv. I intended to do a cp.
      2c0f1eee
    • Daniel Martí's avatar
      v2: stop using a symlink for LICENSE.md · 00a30a95
      Daniel Martí authored
      Go modules has to support many OSs and filesystems,
      so it completely ignores symlinks when creating module zips.
      And parts of the Go ecosystem, like pkg.go.dev, consume those zips.
      
      So this symlink actually makes the module technically non-OSS,
      since it completely lacks a LICENSE file when downloaded.
      pkg.go.dev thus refuses to show its documentation.
      
      One option would be to rename the root LICENSE.md file to just LICENSE,
      since cmd/go treats that file in a special way,
      including it in subdirectory-module zips.
      Unfortunately, we do want the ".md" extension,
      since the file is formatted to outline the double license.
      
      As such, simply copy the file. It should be fine as it's mostly static.
      00a30a95
    • Masih H. Derkani's avatar
      Update the readme with link to examples · 92dec836
      Masih H. Derkani authored
      Add examples link to the README and refine its structure.
      
      List features of CARv2.
      
      Use stronger language to encourage users to use `v2` over `v0`.
      92dec836
  10. 04 Aug, 2021 2 commits
    • Daniel Martí's avatar
      update package godocs and root level README for v2 · 61cfb4db
      Daniel Martí authored
      While at it, make v2's license a symlink, so the two stay in sync.
      The reason we want the v2 module to have its own license is so that its
      module zip includes the file as well.
      
      Besides mentioning v0 and v2, the root README now also links to pkgsite
      and lists Masih and myself as default maintainers, given that we're the
      ones that did major work on this library last.
      
      Fixes #124.
      Fixes #179.
      61cfb4db
    • Daniel Martí's avatar
      blockstore: stop embedding ReadOnly in ReadWrite · 3752fdb3
      Daniel Martí authored
      It has unintended side effects, as we leak the inner ReadOnly value as
      well as its methods to the parent ReadWrite API.
      
      One unintended consequence is that it was possible to "close" a
      ReadWrite, when we only wanted to support finalizing it.
      
      The resumption tests do want to close without finalization,
      for the sake of emulating partially-written CARv2 files.
      Those can hook into the unexported API via export_test.go.
      
      If a downstream really wants to support closing a ReadWrite blockstore
      without finalizing it, two alternatives are outlined in export_test.go.
      3752fdb3
  11. 03 Aug, 2021 1 commit
    • Masih H. Derkani's avatar
      Implement version agnostic streaming CAR block iterator · d0e44a62
      Masih H. Derkani authored
      Implement an iterator over CAR blocks that accepts both v1 and v2. The
      reader only requires io.Reader interface to operate and exposes the
      version and roots of the underlaying CAR file as fields.
      
      Remove `Next` from CARv2 reader now that the iteration functionality is
      provided using `BlockReader`.
      
      Update the benchmarks to reflect the changes and add example usage for
      `NewBlockReader`.
      
      Add tests that assert `BlockReader` behaviour in erroneous and
      successful cases for both underlying CARv1 and CARv2 payload.
      d0e44a62
  12. 02 Aug, 2021 1 commit
    • Daniel Martí's avatar
      blockstore: use errors when API contracts are broken · a6dc547f
      Daniel Martí authored
      It's technically correct to use panics in these situations,
      as the downstream user is clearly breaking the contract documented in
      the API's godoc.
      
      However, all these methods return errors already,
      so panicking is not our only option here.
      
      There's another reason that makes a panic unfortunate.
      ReadWrite.Finalize enables the panic behavior on other methods,
      which makes it much easier to run into in production
      even when the code and its tests work normally.
      
      Finally, there's some precedent for IO interfaces using errors rather
      than panics when they are used after closing.
      For example, os.File.Read returns an error if the file is closed.
      
      Note that this change doesn't make panics entirely impossible.
      The blockstore could still run into exceptional and unexpected panics,
      such as those caused by memory corruption or internal bugs.
      a6dc547f
  13. 30 Jul, 2021 1 commit
    • Daniel Martí's avatar
      add the first read-only benchmarks · 311809b0
      Daniel Martí authored
      BenchmarkReadBlocks uses carv2.OpenReader with its Roots and Next method.
      From six runs on a laptop with a i5-8350U and benchstat:
      
      	name          time/op
      	ReadBlocks-8    633µs ± 2%
      
      	name          speed
      	ReadBlocks-8  824MB/s ± 2%
      
      	name          alloc/op
      	ReadBlocks-8   1.32MB ± 0%
      
      	name          allocs/op
      	ReadBlocks-8    13.5k ± 0%
      
      OpenReadOnlyV1 uses blockstore.OpenReadOnly with its Get method.
      The method is used on all blocks in a shuffled order,
      to ensure that the index is working as intended.
      The input file lacks an index, so we also generate that.
      
      	name              time/op
      	OpenReadOnlyV1-8    899µs ± 1%
      
      	name              speed
      	OpenReadOnlyV1-8  534MB/s ± 1%
      
      	name              alloc/op
      	OpenReadOnlyV1-8   1.52MB ± 0%
      
      	name              allocs/op
      	OpenReadOnlyV1-8    27.2k ± 0%
      
      Both benchmarks use the "sample" v1/v2 CAR files, which weigh about half
      a megabyte. It seems like a good size; a significantly larger CAR file
      would make each benchmark iteration take tens or hundreds of
      milliseconds, making it much slower to obtain many benchmark results,
      since we want at least thousands of iterations to avoid noise.
      311809b0
  14. 25 Jul, 2021 1 commit
  15. 23 Jul, 2021 2 commits
    • Masih H. Derkani's avatar
      Propagate async `blockstore.AllKeysChan` errors via context · de2711c0
      Masih H. Derkani authored
      Implement a mechanism that allows the user to hook an error handling
      function to the context passed to `AllKeysChan`. The function is then
      notified when an error occurs during asynchronous traversal of data
      payload.
      
      Add tests that assert the success and failure cases when error handler
      is set.
      
      Fixes #177
      de2711c0
    • Masih H. Derkani's avatar
      Add zero-length sections as EOF option to internal CARv1 reader · 703b88c2
      Masih H. Derkani authored
      The CARv2 implementation uses an internal fork of CARv1 due to upstream
      dependency issues captured in #104. Propagate the options set in CARv2
      APIs for treatment of zero-lenth sections onto internal packages so that
      APIs using the internal CARv1 reader behave consistently. Use a `bool`
      for setting the option, since it is the only option needed in CARv1.
      
      Update tests to reflect changes.
      
      Add additional tests to internal CARv1 package and ReadOnly blockstore
      that assert option is propagated.
      
      Fixes #190
      703b88c2
  16. 21 Jul, 2021 2 commits
    • Masih H. Derkani's avatar
      Improve error handing in tests · ffcc4b77
      Masih H. Derkani authored
      When error is not expected in tests, assert that there is indeed no
      error.
      ffcc4b77
    • Masih H. Derkani's avatar
      Allow `ReadOption`s to be set when getting or generating index · 396cc228
      Masih H. Derkani authored
      Fix a bug in read options where the value of zero-len option is always
      set to `true` even if the flag passed is false. This meant if the option
      was present it would have always enabled the zero-len as EOF.
      
      Allow the user to specify read options on `ReadOrGenerateIndex`. Pass
      the options onto downstream reader/generators.
      
      Generate a CARv1 file with null padding and check it into `testdata`. In
      addition check in another such file from ignite just to have two
      different samples of this case.
      
      Enhance index generation tests to expect error when zero-len option is
      not set, and generate index when it is for files with null padding.
      
      Make RW blockstore test use the fixed test files from `testdata` instead
      of generating every time.
      396cc228
  17. 20 Jul, 2021 3 commits
    • Masih H. Derkani's avatar
      Use `ioutil.TempFile` to simplify file creation in index example · 326783fe
      Masih H. Derkani authored
      Use existing SDK to create temp file in index example.
      326783fe
    • Masih H. Derkani's avatar
      Avoid writing to files in testdata · c3a59556
      Masih H. Derkani authored
      Avoid writing to files in `testdata` through examples. This is because
      when tests that use those file are run in parallel will fail if files
      are modified. Instead write to a temporary file in examples, and
      whenever opening testdata files in `RW` mode, make a temporary copy.
      
      Thanks to @mvdan for pointing this out.
      
      Fixes #175
      c3a59556
    • Daniel Martí's avatar
      blockstore: implement UseWholeCIDs · c2e497e2
      Daniel Martí authored
      Update the tests too, as a few expected whole CIDs.
      For now, just make them use the option.
      
      This required index.Index to gain a new method, GetAll, so that when
      using whole CIDs, methods like Get can return true if any of the
      matching indexed CIDs are an exact whole-CID match, and not just looking
      at the first matching indexed CID.
      
      GetAll is akin to an iteration over all matching indexed CIDs, and its
      callback returns a boolean to say if the iteration should continue.
      This allows stopping as soon as we're done.
      
      We also remove Index.Get, instead replacing it with a helper called
      GetFirst, which simply makes simple uses of GetAll a single line.
      
      We remove the non-specced and unused index implementations, too.
      They were left in place in case they were useful again,
      but they haven't been so far, and their code is still in git.
      Keeping them around just means updating more code when refactoring.
      
      While at it, make ZeroLengthSectionAsEOF take a boolean and return an
      option, just like the other boolean options, for consistency.
      
      Fixes #130.
      c2e497e2
  18. 16 Jul, 2021 12 commits