1. 20 Jul, 2021 3 commits
    • Masih H. Derkani's avatar
      Use `ioutil.TempFile` to simplify file creation in index example · 326783fe
      Masih H. Derkani authored
      Use existing SDK to create temp file in index example.
      326783fe
    • Masih H. Derkani's avatar
      Avoid writing to files in testdata · c3a59556
      Masih H. Derkani authored
      Avoid writing to files in `testdata` through examples. This is because
      when tests that use those file are run in parallel will fail if files
      are modified. Instead write to a temporary file in examples, and
      whenever opening testdata files in `RW` mode, make a temporary copy.
      
      Thanks to @mvdan for pointing this out.
      
      Fixes #175
      c3a59556
    • Daniel Martí's avatar
      blockstore: implement UseWholeCIDs · c2e497e2
      Daniel Martí authored
      Update the tests too, as a few expected whole CIDs.
      For now, just make them use the option.
      
      This required index.Index to gain a new method, GetAll, so that when
      using whole CIDs, methods like Get can return true if any of the
      matching indexed CIDs are an exact whole-CID match, and not just looking
      at the first matching indexed CID.
      
      GetAll is akin to an iteration over all matching indexed CIDs, and its
      callback returns a boolean to say if the iteration should continue.
      This allows stopping as soon as we're done.
      
      We also remove Index.Get, instead replacing it with a helper called
      GetFirst, which simply makes simple uses of GetAll a single line.
      
      We remove the non-specced and unused index implementations, too.
      They were left in place in case they were useful again,
      but they haven't been so far, and their code is still in git.
      Keeping them around just means updating more code when refactoring.
      
      While at it, make ZeroLengthSectionAsEOF take a boolean and return an
      option, just like the other boolean options, for consistency.
      
      Fixes #130.
      c2e497e2
  2. 16 Jul, 2021 37 commits
    • Daniel Martí's avatar
      Merge wip-v2 into master (#178) · 0ae4f9c0
      Daniel Martí authored
      We've been working on CARv2 in the wip-v2 branch for some weeks.
      
      In preparation for the first v2 beta, merge all of those commits into master.
      
      The commits are mostly as they were originally, minus some minor squashing to clean up the history.
      0ae4f9c0
    • Daniel Martí's avatar
    • Masih H. Derkani's avatar
      Add example blockstore examples · ae4ddd41
      Masih H. Derkani authored
      Implement examples that show how to open a read-only blockstore, and a
      read-write blockstore along with resumption from the same file.
      
      The example surfaced a race condition where if the `AllKeysChan` is
      partially consumed and file is closed then the gorutie started by chan
      population causes the race condition. Locking on Close will make the
      closure hang. Better solution is needed but choices are limited due to
      `AllKeysChan` signature.
      
      Fixes:
      - https://github.com/ipld/go-car/issues/124
      ae4ddd41
    • Masih H. Derkani's avatar
      Remove `index.Save` API · e45d3b24
      Masih H. Derkani authored
      It's two lines of code extra and we already have read from and write to
      APIs for index.
      
      Remove `hydrate` cli since the assumptions it makes about appending
      index are no longer true. header needs updating for example. Removing to
      be re-added when needed as part of a propper CARv2 CLI.
      e45d3b24
    • Masih H. Derkani's avatar
      Replace a file if exists on `index.Save` · 64bb51b2
      Masih H. Derkani authored
      If appendage is needed there is already `car.AttachIndex`.
      64bb51b2
    • Masih H. Derkani's avatar
      Rename `CarV1` with `Data` in API to be consistent with spec · aa62719d
      Masih H. Derkani authored
      Rename terminology to match what the spec uses to describe the inner
      CARv1 payload, i.e. Data payload.
      
      Update docs to use CARv1 and CARv2 consistently.
      
      Fix typo in Options API.
      
      See:
      - https://ipld.io/specs/transport/car/carv2/
      aa62719d
    • Masih H. Derkani's avatar
      Remove memory intensive writer and add warping tests · 0f79c3e8
      Masih H. Derkani authored
      Remove the memory intensive CARv2 writer implementation since it needs
      to be reimplemented anyway and not used. We will release writers if
      needed when it comes to SelectiveCar writer implementation. Just now it
      is not clear if we want a CARv2 writer that works with the deprecated
      `format.NodeGetter`.
      
      Add tests for V1 wrapping functionality.
      
      Reformat code in repo for consistency using `gofumpt`.
      0f79c3e8
    • Masih H. Derkani's avatar
      Implement examples and tests for `index` package · 0aefa5b1
      Masih H. Derkani authored
      Add examples that show how to read and write an index to/from file.
      
      Test marshalling and unmarshalling index files.
      
      Run `gofumt` on repo.
      0aefa5b1
    • Daniel Martí's avatar
      unify options and add more blockstore options · 7d8f54ff
      Daniel Martí authored
      We've agreed to have unified options, since many will be shared between
      the root and blockstore packages.
      
      Include docs, and update the tests.
      7d8f54ff
    • Masih H. Derkani's avatar
      Refactor utility io conversions into one internal package · f3fc5958
      Masih H. Derkani authored
      Improve reader type conversion by checking if type satisfies
      ReaderAt to avoid unnecessary wrapping.
      
      Move io converters into one place.
      
      Fixes:
      - https://github.com/ipld/go-car/issues/145
      f3fc5958
    • Daniel Martí's avatar
      update go-cid to get the CidFromReader panic fix · 6434cd83
      Daniel Martí authored
      Fixes #169.
      6434cd83
    • Masih H. Derkani's avatar
      Remove duplicate test CARv1 file · fd226df9
      Masih H. Derkani authored
      Remove the duplicate test CARv1 file that existed both in `testdata` and
      `blockstore/testdata` in favour of the one in root.
      
      Reflect change in tests.
      
      Duplicity checked using `md5` digest:
      
      ```sh
      $ md5 testdata/sample-v1.car
      MD5 (testdata/sample-v1.car) = 14fcffc271e50bc56cfce744d462a9bd
      
      $ md5 blockstore/testdata/test.car
      MD5 (blockstore/testdata/test.car) = 14fcffc271e50bc56cfce744d462a9bd
      ```
      fd226df9
    • Masih H. Derkani's avatar
      Fix index not found error not being propagated · 7751d8b9
      Masih H. Derkani authored
      Fix an issue where single width index not finding a given key returns
      `0` offset instead of `index.ErrNotFound`.
      
      Reflect the changes in blockstore to return appropriate IPFS blockstore
      error.
      
      Add tests that asserts error types both in index and blockstore
      packages.
      
      Remove redundant TODOs.
      
      Relates to:
      - https://github.com/ipld/go-car/pull/158
      7751d8b9
    • Masih H. Derkani's avatar
      Use consistent import and bot CID versions in testing · 25dc0003
      Masih H. Derkani authored
      Use both CID v0 and v1 in testing `ReadWrite` blockstore.
      
      Use consistent import package name `merkledag` across tests.
      25dc0003
    • Masih H. Derkani's avatar
      Improve test coverage for `ReadWrite` blockstore · dbdb7428
      Masih H. Derkani authored
      Add tests that asserts:
       - when padding options are set the payload is as
      expected
       - when finalized, blockstore calls panic
       - when resumed from mismatching padding error is as expected
       - when resumed from non-v2 file error is as expected
      
      Remove redundant TODOs in code
      dbdb7428
    • Daniel Martí's avatar
      make all "Open" APIs use consistent names · 4827ee39
      Daniel Martí authored
      All the "New" APIs take IO interfaces, so they don't open any file by
      themselves. However, the APIs that take a path to disk and return a
      blockstore or a reader need closing, so the prefix "Open" helps clarify
      that.
      
      Plus, it makes names more consistent.
      4827ee39
    • Daniel Martí's avatar
      use the now-merged cid.CidFromReader · a2cd4640
      Daniel Martí authored
      Fixes #91.
      a2cd4640
    • Masih H. Derkani's avatar
      Unexport unused `Writer` API until it is re-implemented · 1a14cefa
      Masih H. Derkani authored
      Unexport the CARv2 writer API until it is reimplemented using WriterAt
      or WriteSeeker to be more efficient. This API is not used anyway and we
      can postpone releasing it until SelectiveCar writer API is figured out.
      For now we unexport it to carry on with interface finalization and alpha
      release.
      1a14cefa
    • Masih H. Derkani's avatar
      Implement index generation using only `io.Reader` · 6b085bcb
      Masih H. Derkani authored
      Implement the ability to generate index from a CARv1 payload given only
      an `io.Reader`, where the previous implementation required
      `io.ReadSeeker`. The rationale is to be minimal in what we expect in the
      API, since index generation from a CARv1 payload never need to rewind
      the reader and only moves forward in the stream.
      
      Refactor utility IO functions that convert between types in one place.
      Implement constructor functions that only instantiate wrappers when the
      passed argument does not satisfy a required interface.
      
      Fixes:
      - https://github.com/ipld/go-car/issues/146
      
      Relates to:
      - https://github.com/ipld/go-car/issues/145
      6b085bcb
    • Masih H. Derkani's avatar
      Remove dependency to `bufio.Reader` in internal `carv1` package · cc1e449c
      Masih H. Derkani authored
      Remove dependency to `bufio.Reader` in internal `carv1` package that
      seems to be mainly used for peeking a byte to return appropriate error
      when stream abruptly ends, relating to #36. This allows simplification
      of code across the repo and remove all unnecessary wrappings of
      `io.Reader` with `bufio.Reader`. This will also aid simplify the
      internal IO utilities which will be done in future PRs. For now we
      simply remove dependency to `bufio.Reader`
      
      See:
      - https://github.com/ipld/go-car/pull/36
      cc1e449c
    • aarshkshah1992's avatar
      index error should return not found · 62be0f34
      aarshkshah1992 authored
      62be0f34
    • Masih H. Derkani's avatar
      Support resumption in `ReadWrite` blockstore for finalized files · 96e8614d
      Masih H. Derkani authored
      Support resumption from a complete CARv2 file, effectively implementing
      append block to a CARv2 file. Note, on resumption the entire file is
      re-indexed. So it is undesirable to do this for complete cars.
      
      Fix a bug where `AllKeysChan` failed on files that contained the index
      and resumed from. Fix is done by truncating the CARv2 file, removing the
      index if it is present when blockstore is constructed when resuming.
      This is because if `AllKeysChan` is called on a file that contains the
      index, the ReadOnly backing will continue to read until EOF which will
      result in error. Since file is re-indexed on resumption and we require
      finalize to be called we truncate the file to remove index.
      
      Write header on every put so that resuming from finalized files that put
      more blocks without then finalizing works as expected. Otherwise,
      because the last put did not finalize and header is not updated any
      blocks put will effectively be lost.
      
      Add tests that assert resume with and without finalization. Add tests
      that assert read operations work on resumed read-write blockstore.
      
      Fixes: https://github.com/ipld/go-car/issues/155
      
      Address review comments
      
      Avoid having to write car v2 header on every put by unfinalizing the
      blockstore first if resumed from a finalized file.
      
      Fix random seed for tests.
      96e8614d
    • Masih H. Derkani's avatar
      Move index generation functionality to root v2 package · 24b82536
      Masih H. Derkani authored
      Move index generation functionality to root, because they read/write CAR
      files. This leaves the `index` package to only contain the index
      implementation itself.
      
      Move `index.Attach` to root while at it following the same logic. Add
      TODOs to the implementation of Attach to make it more robust. As is it
      _could_ result in corrupt v2 files if offset includes padding since v2
      header is not updated in this function.
      
      Add tests to generate index and move existing tests to correspond to the
      go file containing generation functionality.
      24b82536
    • Masih H. Derkani's avatar
      Remove redundant seek before read or generate index · cb2b58de
      Masih H. Derkani authored
      Assume the read seeker passed in is seeked to the beginning of CAR
      payload. This is both for consistency of API and avoid unnecessary
      seeks when the reader may already be at the right place.
      
      Address review comments
      
      * Use the file already open to get the stats for resumption purposes. On
      getting the stats, because the open flag contains `os.O_CREATE` check
      the size of the file as a way to determine wheter we should attempt
      resumption or not. In practice this is the same as the code that
      explicitly differentiates from non-existing files, since file existence
      doesn't matter here; the key thing is the file content. Plus this also
      avoids unnecessary errors where the files exists but is empty.
      
      * Add TODOs in places to consider enabling deduplication by default and
      optimise computational complexity of roots check. Note, explicitly
      enable deduplication by default in a separate PR so that the change is
      clearly communicated to dependant clients in case it causes any
      unintended side-effects.
      
      * Rename `Equals` to `Matches` to avoid confusion about what it does.
      
      Relates to:
      - https://github.com/ipld/go-car/pull/147#discussion_r668587909
      - https://github.com/ipld/go-car/pull/147#discussion_r668603050
      - https://github.com/ipld/go-car/pull/147#discussion_r668609927
      cb2b58de
    • Masih H. Derkani's avatar
      Unexport unecesserry index APIs and integrate with mulitcodec · 2b593c11
      Masih H. Derkani authored
      Use the codec dedicated to CAR index sorted when marshalling and
      unmarshalling `indexSorted`. Note, the code depends on a specific commit
      of `go-multicodec` `master` branch. This needs to be replaced with a tag
      once a release is made on the go-multicodec side later.
      
      Unexport index APIs that should not be exposed publicly. Remove
      `Builder` now that it is not needed anywhere. Move `insertionIndex` into
      `blockstore` package since that's the only place it is used.
      
      Introduce an index constructor that takes multicodec code and
      instantiates an index.
      
      Fix ignored errors in `indexsorted.go` during marshalling/unmarshlling.
      
      Rename index constructor functions to use consistent terminology; i.e.
      `new` instead if `mk`.
      
      Remove redundant TODOs in code.
      
      Relates to:
      - https://github.com/multiformats/go-multicodec/pull/46
      
      Address review comments
      
      * Rename constructor of index by codec to a simpler name and update
      docs.
      
      * Use multicodec.Code as constant instead of wrappint unit64 every time.
      2b593c11
    • Masih H. Derkani's avatar
      Assert write operations on ReadOnly blockstore panic · 224cd8fd
      Masih H. Derkani authored
      Update typo in docs while at it.
      224cd8fd
    • Masih H. Derkani's avatar
      Implement more tests for `ReadOnly` blockstore · ba7ed7c9
      Masih H. Derkani authored
      Implement tests that check `ReadOnly` blockstore functions work based on
      data payload originated from a V1 or V2 formatted CAR file.
      
      Implement tests that assert instantiation of `ReadOnly` blockstore
      fails on unexpected pragma version.
      ba7ed7c9
    • Masih H. Derkani's avatar
      Fix bug in ReadOnly blockstore GetSize · 7373fe29
      Masih H. Derkani authored
      Fix a bug where the returned size in `ReadOnly.GetSize` blockstore was
      incorrect and the CID was being read from the wrong offset.
      
      Implement tests to assert returned size matches the block raw data size.
      
      Fixes #133
      7373fe29
    • Masih H. Derkani's avatar
      Remove resumption option for automatic resumption when file exists · eca650ea
      Masih H. Derkani authored
      When file exists, attempt resumption by default. This seems like a more
      user-friendly API and less requirements to explicitly tune knobs.
      
      Explicitly fail of the file is finalized. We technically can append to a
      finalized file but that involves changes in multiple places in reader
      abstractions. Left a TODO as a feature for future when asked for.
      
      Close off file if construction of ReadWrite blockstore fails after file
      is opened. If left open, it causes test failures when t cleans up temp
      files.
      eca650ea
    • Masih H. Derkani's avatar
      Avoid creating files on resumption in`ReadWrite` blockstore · 8c08fbfe
      Masih H. Derkani authored
      When resumption is enabled, we do not want to create a file if it does
      not exist. Similarly, when resumption is disabled, fail if the given
      file at path already exists to avoid unintended file overrides and
      creations.
      8c08fbfe
    • Masih H. Derkani's avatar
      Implement ReadWrite blockstore resumption · 2a5f8942
      Masih H. Derkani authored
      * Implement an option for read-write blockstore, that if enabled the
      write can resume from where the writer left off. For resumption to work
      the `WithResumption` option needs to be set explicitly. Otherwise, if
      path to an existing file is passed, the blockstore construction will
      return an error. The resumption requires the roots passed to constructor
      as well as padding options to be identical with roots in file.
      Resumption only works on paths where at least V2 pragma and CAR v1
      header was successfully written onto the file. Otherwise an error is
      returned.
      
      * Implement resumption test that verifies files resumed from match
      expected  header, data and index.
      
      * Implement a CAR v1 equals function to check if two given headers are
      identical. This implementation requires exact ordering of root elements.
      A TODO is left to relax the exact ordering requirement.
      
      * Implement Seeker in internal offset writer in order to forward offset
      of CAR v1 writer within a resumed read-write blockstore after
      resumption. The offset of the writer needs to be set to the latest
      written frame in order for consecutive writes to be at the right offset.
      
      * Fix bug in offset read seeker in internal IO, where seek and returned
      position was not normalized by the base. Reflect the fix in read-only
      blockstore AllKeysChan where reader was twisted to work. We now read the
       header to get its size, then seek past it to then iterate over blocks
       to populate the channel.
      
      * Add TODOs in places to make treating zero-length frames as EOF
      optional; See #140 for context.
      
      * Run `gofumpt -l -w .` on everything to maintain consistent formatting.
      
      Address review comments
      
      * Implement equality check for CAR v1 headers where roots in different
        order are considered to be equal.
      
      * Improve resumption docs to clarify what matching roots mean.
      2a5f8942
    • Masih H. Derkani's avatar
      Replace redundant use of RW mutex where a simple mutex would do · a6e8bc1b
      Masih H. Derkani authored
      Simply use mutex where reader will only do reading.
      a6e8bc1b
    • Masih H. Derkani's avatar
      Implement read or generate index from CAR accepting v1 or v2 · fc494d46
      Masih H. Derkani authored
      A user has to undergo a lot of boilerplate code to generate or get an
      index where there is a CAR file, the version for which is unknown.
      Provide a function that accepts either v1 or v2 payload and reads the
      index if present, otherwise generates it for the user. The generated
      index lives entirely in memory and will not depend on the given
      `io.ReadSeeker` to lookup indices.
      
      Implement tests that check index is returned for v1 or v2 files. Add a
      sample file with an imaginary version of 42 for testing purposes to
      assert that the indexing function will error if the given file is not v1
      or v2.
      
      See: https://github.com/ipld/go-car/issues/143
      fc494d46
    • Daniel Martí's avatar
      index: add support for null paddings in Generate · 6765c84a
      Daniel Martí authored
      Fixes #140.
      6765c84a
    • Masih H. Derkani's avatar
      Consistently accept CAR v1 or v2 on `blockstore.OpenReadOnly` · 2776842e
      Masih H. Derkani authored
      Since `blockstore.NewReadOnly` accepts both versions, and the `attach`
      flag is never used in `OpenReadOnly` make things consistent and arguably
      more useful by accepting both v1 and v2 CAR files. This also avoids
      modifying files passed to a read-only API which is always sweet as
      sugar.
      2776842e
    • Masih H. Derkani's avatar
      Accommodate `index.Generate` taking `io.ReadSeeker` · c0023edf
      Masih H. Derkani authored
      Accommodate to the `index.Generate` signature change, done in #136.
      Insead of implementng a new reader that converts `io.ReaderAt` to
      `io.ReadSeaker`, make `internalio.OffsetReader` partially implement
      `Seek`. The "partial" refers to inability to satisfy `Seek` calls with
      `io.SeekEnd` whence since the point of `internalio.OffsetReader` is that
      it needs not to know the total size of a file, i.e. cannot know its end.
      Instead, it panics if `Seek` is called with whence `io.SeekEnd`.
      
      None of this matterns much since it is all placed under internal APIs.
      c0023edf
    • Masih H. Derkani's avatar
      Rename `ReadOnlyOf` to `NewReadOnly` and accept both v1 and v2 payload · 4e863b28
      Masih H. Derkani authored
      Rename the `blockstore.ReadOnlyOf` to `blockstore.NewReadOnly` for
      better readability and consistent naming.
      
      Allow the function to accept both v1 and v2 payloads and "do the right
      thing" depending on the input.
      
      Optionally, allow the caller to supply an index that, if present, will
      overrider any existing index. Otherwise an in-memory index is generated.
      
      Note, `blockstore.OpenReadOnly` only accepts v2 payload. Future PRs will
      change this function to also accept both versions for consistency, and
      drop modification of given path, delegating any writing to the
      write-specific APIs like `index.Attach` and `index.Generate`.
      
      Fixes #128
      4e863b28