- 16 Jul, 2021 40 commits
-
-
Masih H. Derkani authored
Remove the memory intensive CARv2 writer implementation since it needs to be reimplemented anyway and not used. We will release writers if needed when it comes to SelectiveCar writer implementation. Just now it is not clear if we want a CARv2 writer that works with the deprecated `format.NodeGetter`. Add tests for V1 wrapping functionality. Reformat code in repo for consistency using `gofumpt`.
-
Masih H. Derkani authored
Add examples that show how to read and write an index to/from file. Test marshalling and unmarshalling index files. Run `gofumt` on repo.
-
Daniel Martí authored
We've agreed to have unified options, since many will be shared between the root and blockstore packages. Include docs, and update the tests.
-
Masih H. Derkani authored
Improve reader type conversion by checking if type satisfies ReaderAt to avoid unnecessary wrapping. Move io converters into one place. Fixes: - https://github.com/ipld/go-car/issues/145
-
Daniel Martí authored
Fixes #169.
-
Masih H. Derkani authored
Remove the duplicate test CARv1 file that existed both in `testdata` and `blockstore/testdata` in favour of the one in root. Reflect change in tests. Duplicity checked using `md5` digest: ```sh $ md5 testdata/sample-v1.car MD5 (testdata/sample-v1.car) = 14fcffc271e50bc56cfce744d462a9bd $ md5 blockstore/testdata/test.car MD5 (blockstore/testdata/test.car) = 14fcffc271e50bc56cfce744d462a9bd ```
-
Masih H. Derkani authored
Fix an issue where single width index not finding a given key returns `0` offset instead of `index.ErrNotFound`. Reflect the changes in blockstore to return appropriate IPFS blockstore error. Add tests that asserts error types both in index and blockstore packages. Remove redundant TODOs. Relates to: - https://github.com/ipld/go-car/pull/158
-
Masih H. Derkani authored
Use both CID v0 and v1 in testing `ReadWrite` blockstore. Use consistent import package name `merkledag` across tests.
-
Masih H. Derkani authored
Add tests that asserts: - when padding options are set the payload is as expected - when finalized, blockstore calls panic - when resumed from mismatching padding error is as expected - when resumed from non-v2 file error is as expected Remove redundant TODOs in code
-
Daniel Martí authored
All the "New" APIs take IO interfaces, so they don't open any file by themselves. However, the APIs that take a path to disk and return a blockstore or a reader need closing, so the prefix "Open" helps clarify that. Plus, it makes names more consistent.
-
Daniel Martí authored
Fixes #91.
-
Masih H. Derkani authored
Unexport the CARv2 writer API until it is reimplemented using WriterAt or WriteSeeker to be more efficient. This API is not used anyway and we can postpone releasing it until SelectiveCar writer API is figured out. For now we unexport it to carry on with interface finalization and alpha release.
-
Masih H. Derkani authored
Implement the ability to generate index from a CARv1 payload given only an `io.Reader`, where the previous implementation required `io.ReadSeeker`. The rationale is to be minimal in what we expect in the API, since index generation from a CARv1 payload never need to rewind the reader and only moves forward in the stream. Refactor utility IO functions that convert between types in one place. Implement constructor functions that only instantiate wrappers when the passed argument does not satisfy a required interface. Fixes: - https://github.com/ipld/go-car/issues/146 Relates to: - https://github.com/ipld/go-car/issues/145
-
Masih H. Derkani authored
Remove dependency to `bufio.Reader` in internal `carv1` package that seems to be mainly used for peeking a byte to return appropriate error when stream abruptly ends, relating to #36. This allows simplification of code across the repo and remove all unnecessary wrappings of `io.Reader` with `bufio.Reader`. This will also aid simplify the internal IO utilities which will be done in future PRs. For now we simply remove dependency to `bufio.Reader` See: - https://github.com/ipld/go-car/pull/36
-
aarshkshah1992 authored
-
Masih H. Derkani authored
Support resumption from a complete CARv2 file, effectively implementing append block to a CARv2 file. Note, on resumption the entire file is re-indexed. So it is undesirable to do this for complete cars. Fix a bug where `AllKeysChan` failed on files that contained the index and resumed from. Fix is done by truncating the CARv2 file, removing the index if it is present when blockstore is constructed when resuming. This is because if `AllKeysChan` is called on a file that contains the index, the ReadOnly backing will continue to read until EOF which will result in error. Since file is re-indexed on resumption and we require finalize to be called we truncate the file to remove index. Write header on every put so that resuming from finalized files that put more blocks without then finalizing works as expected. Otherwise, because the last put did not finalize and header is not updated any blocks put will effectively be lost. Add tests that assert resume with and without finalization. Add tests that assert read operations work on resumed read-write blockstore. Fixes: https://github.com/ipld/go-car/issues/155 Address review comments Avoid having to write car v2 header on every put by unfinalizing the blockstore first if resumed from a finalized file. Fix random seed for tests.
-
Masih H. Derkani authored
Move index generation functionality to root, because they read/write CAR files. This leaves the `index` package to only contain the index implementation itself. Move `index.Attach` to root while at it following the same logic. Add TODOs to the implementation of Attach to make it more robust. As is it _could_ result in corrupt v2 files if offset includes padding since v2 header is not updated in this function. Add tests to generate index and move existing tests to correspond to the go file containing generation functionality.
-
Masih H. Derkani authored
Assume the read seeker passed in is seeked to the beginning of CAR payload. This is both for consistency of API and avoid unnecessary seeks when the reader may already be at the right place. Address review comments * Use the file already open to get the stats for resumption purposes. On getting the stats, because the open flag contains `os.O_CREATE` check the size of the file as a way to determine wheter we should attempt resumption or not. In practice this is the same as the code that explicitly differentiates from non-existing files, since file existence doesn't matter here; the key thing is the file content. Plus this also avoids unnecessary errors where the files exists but is empty. * Add TODOs in places to consider enabling deduplication by default and optimise computational complexity of roots check. Note, explicitly enable deduplication by default in a separate PR so that the change is clearly communicated to dependant clients in case it causes any unintended side-effects. * Rename `Equals` to `Matches` to avoid confusion about what it does. Relates to: - https://github.com/ipld/go-car/pull/147#discussion_r668587909 - https://github.com/ipld/go-car/pull/147#discussion_r668603050 - https://github.com/ipld/go-car/pull/147#discussion_r668609927
-
Masih H. Derkani authored
Use the codec dedicated to CAR index sorted when marshalling and unmarshalling `indexSorted`. Note, the code depends on a specific commit of `go-multicodec` `master` branch. This needs to be replaced with a tag once a release is made on the go-multicodec side later. Unexport index APIs that should not be exposed publicly. Remove `Builder` now that it is not needed anywhere. Move `insertionIndex` into `blockstore` package since that's the only place it is used. Introduce an index constructor that takes multicodec code and instantiates an index. Fix ignored errors in `indexsorted.go` during marshalling/unmarshlling. Rename index constructor functions to use consistent terminology; i.e. `new` instead if `mk`. Remove redundant TODOs in code. Relates to: - https://github.com/multiformats/go-multicodec/pull/46 Address review comments * Rename constructor of index by codec to a simpler name and update docs. * Use multicodec.Code as constant instead of wrappint unit64 every time.
-
Masih H. Derkani authored
Update typo in docs while at it.
-
Masih H. Derkani authored
Implement tests that check `ReadOnly` blockstore functions work based on data payload originated from a V1 or V2 formatted CAR file. Implement tests that assert instantiation of `ReadOnly` blockstore fails on unexpected pragma version.
-
Masih H. Derkani authored
Fix a bug where the returned size in `ReadOnly.GetSize` blockstore was incorrect and the CID was being read from the wrong offset. Implement tests to assert returned size matches the block raw data size. Fixes #133
-
Masih H. Derkani authored
When file exists, attempt resumption by default. This seems like a more user-friendly API and less requirements to explicitly tune knobs. Explicitly fail of the file is finalized. We technically can append to a finalized file but that involves changes in multiple places in reader abstractions. Left a TODO as a feature for future when asked for. Close off file if construction of ReadWrite blockstore fails after file is opened. If left open, it causes test failures when t cleans up temp files.
-
Masih H. Derkani authored
When resumption is enabled, we do not want to create a file if it does not exist. Similarly, when resumption is disabled, fail if the given file at path already exists to avoid unintended file overrides and creations.
-
Masih H. Derkani authored
* Implement an option for read-write blockstore, that if enabled the write can resume from where the writer left off. For resumption to work the `WithResumption` option needs to be set explicitly. Otherwise, if path to an existing file is passed, the blockstore construction will return an error. The resumption requires the roots passed to constructor as well as padding options to be identical with roots in file. Resumption only works on paths where at least V2 pragma and CAR v1 header was successfully written onto the file. Otherwise an error is returned. * Implement resumption test that verifies files resumed from match expected header, data and index. * Implement a CAR v1 equals function to check if two given headers are identical. This implementation requires exact ordering of root elements. A TODO is left to relax the exact ordering requirement. * Implement Seeker in internal offset writer in order to forward offset of CAR v1 writer within a resumed read-write blockstore after resumption. The offset of the writer needs to be set to the latest written frame in order for consecutive writes to be at the right offset. * Fix bug in offset read seeker in internal IO, where seek and returned position was not normalized by the base. Reflect the fix in read-only blockstore AllKeysChan where reader was twisted to work. We now read the header to get its size, then seek past it to then iterate over blocks to populate the channel. * Add TODOs in places to make treating zero-length frames as EOF optional; See #140 for context. * Run `gofumpt -l -w .` on everything to maintain consistent formatting. Address review comments * Implement equality check for CAR v1 headers where roots in different order are considered to be equal. * Improve resumption docs to clarify what matching roots mean.
-
Masih H. Derkani authored
Simply use mutex where reader will only do reading.
-
Masih H. Derkani authored
A user has to undergo a lot of boilerplate code to generate or get an index where there is a CAR file, the version for which is unknown. Provide a function that accepts either v1 or v2 payload and reads the index if present, otherwise generates it for the user. The generated index lives entirely in memory and will not depend on the given `io.ReadSeeker` to lookup indices. Implement tests that check index is returned for v1 or v2 files. Add a sample file with an imaginary version of 42 for testing purposes to assert that the indexing function will error if the given file is not v1 or v2. See: https://github.com/ipld/go-car/issues/143
-
Daniel Martí authored
Fixes #140.
-
Masih H. Derkani authored
Since `blockstore.NewReadOnly` accepts both versions, and the `attach` flag is never used in `OpenReadOnly` make things consistent and arguably more useful by accepting both v1 and v2 CAR files. This also avoids modifying files passed to a read-only API which is always sweet as sugar.
-
Masih H. Derkani authored
Accommodate to the `index.Generate` signature change, done in #136. Insead of implementng a new reader that converts `io.ReaderAt` to `io.ReadSeaker`, make `internalio.OffsetReader` partially implement `Seek`. The "partial" refers to inability to satisfy `Seek` calls with `io.SeekEnd` whence since the point of `internalio.OffsetReader` is that it needs not to know the total size of a file, i.e. cannot know its end. Instead, it panics if `Seek` is called with whence `io.SeekEnd`. None of this matterns much since it is all placed under internal APIs.
-
Masih H. Derkani authored
Rename the `blockstore.ReadOnlyOf` to `blockstore.NewReadOnly` for better readability and consistent naming. Allow the function to accept both v1 and v2 payloads and "do the right thing" depending on the input. Optionally, allow the caller to supply an index that, if present, will overrider any existing index. Otherwise an in-memory index is generated. Note, `blockstore.OpenReadOnly` only accepts v2 payload. Future PRs will change this function to also accept both versions for consistency, and drop modification of given path, delegating any writing to the write-specific APIs like `index.Attach` and `index.Generate`. Fixes #128
-
Masih H. Derkani authored
To guarantee minimal encoding Fixes #135 Reflect on review comments Use `varint.PutUvarint` when dealing with writing bytes.
-
Daniel Martí authored
This requires index.Generate to work on io.ReadSeeker, which is a good idea anyway, as it just needs to read and seek. The main advantage of ReaderAt is that it allows concurrent reads, but index generation is sequential by nature. Plus, it only needs seeking to skip blocks; it doesn't need random access to the carv1. Also make the WrapV1 example test that the resulting carv2 works, including the index. Fixes #129.
-
Daniel Martí authored
As per the TODO. Also delete the other TODO. An index doesn't have any paddings or large chunks to be skipped, so ReaderAt is not necessary nor useful. The index package's ReadFrom works on a Reader, too.
-
Daniel Martí authored
The go-cid change hasn't been merged yet, but it's already had two reviews, and it looks like it will get merged early next week. We can move to go-cid master before the final release.
-
Daniel Martí authored
go-cid's CidFromBytes does exactly what we want already.
-
Daniel Martí authored
And a test that uses duplicate hashes as well as duplicate CIDs. We reuse the same insertion index, since it's enough for this purpose. There's no need to keep a separate map or set of CIDs. While at it, make the index package not silently swallow errors, and improve the tests to handle errors more consistently. Fixes #123. Fixes #125.
-
Daniel Martí authored
See the added doc and example. Also needed to make index marshaling deterministic, as per the spec.
-
Daniel Martí authored
IPLD traversals would sporadically fail due to data races, since they do concurrent blockstore Puts. The added test reproduced that kind of error very reliably. Fixes #121.
-
Daniel Martí authored
Fixes #116.
-