flatfs.go · a095ff54c78609a81df33a5fa4757d032bf08a2d · dms3 / go-ds-flatfs

Feat: Implement a PersistentDatastore by adding DiskUsage method (#27) · a095ff54
Hector Sanjuan authored Mar 09, 2018
* Feat: Implement a PersistentDatastore by adding DiskUsage method

This adds DiskUsage().

This datastore would have a big performance hit if we walked the
filesystem to calculate disk usage everytime.

Therefore I have opted to keep tabs of current disk usage by
walking the filesystem once during "Open" and then adding/subtracting
file sizes on Put/Delete operations.

On the plus:
  * Small perf impact
  * Always up to date values
  * No chance that race conditions will leave DiskUsage with wrong values

On the minus:
  * Slower Open() - it run Stat() on all files in the datastore
  * Size does not match real size if a directory grows large
    (at least on ext4 systems). We don't track directory-size changes,
    only use the creation size.

* Update .travis.yml: latest go

* DiskUsage: cache diskUsage on Close()

Avoids walking the whole datastore when a clean shutdown happened.

File is removed on read, so a non-cleanly-shutdown datastore
will not find an outdated file later.

* Manage diskUsage with atomic.AddInt64 (no channel). Use tmp file + rename.

* Remove redundant comments

* Address race conditions when writing/deleting the same key concurrently

This improves diskUsage book-keeping when writing and deleting the same
key concurrently. It however means that existing values in the datastore
cannot be replaced without a explicit delete (before put).

A new test checks that there are no double counts in a put/delete race
condition environment. This is true when sync is enabled. No syncing
causes small over-counting when deleting files concurrently to put.

* Document that datastore Put does not replace values

* Comment TestPutOverwrite

* Implement locking and discard for concurrent operations on the same key

This implements the approach suggested by @stebalien in
https://github.com/ipfs/go-ds-flatfs/pull/27

Write operations (delete/put) to the same key are tracked in a map
which provides a shared lock. Concurrent operations to that key
will share that lock. If one operation succeeds, it will remove
the lock from the map and the others using it will automatically
succeed. If one operation fails, it will let the others waiting
for the lock try.

New operations to that key will request a new lock.

A new test for putMany (batching) has been added.

Worth noting: a concurrent Put+Delete on a non-existing key
always yields Put as the winner (delete will fail if it comes first,
or will skipped if it comes second).

* Do less operation in tests (travis fails on mac)

* Reduce counts again

* DiskUsage: address comments. Use sync.Map.

* Add rw and rwundo rules to Makefile

* DiskUsage: use one-off locks for operations

Per @stebalien 's suggestion.

* DiskUsage: write checkpoint file when du changes by more than 1 percent

Meaning, if the difference between the checkpoint file value and the current
is more than one percent, we checkpoint it.

* Fix tests so they ignore disk usage cache file

* Rename: update disk usage when rename fails too..

* Improve rename comment and be less explicit on field initialization

* Do not use filepath.Walk, use Readdir instead.

* Estimate diskUsage for folders with more than 100 files

This will estimate disk usage when folders have more than
100 files in them. Non processed files will be assumed to have
the average size of processed ones.

* Select file randomly when there are too many to read

* Fix typo

* fix tests

* Set time deadline to 5 minutes.

This provides a disk estimation deadline. We will stat() as many
files as possible until we run out of time. If that happens,
the rest will be calculated as an average.

The user is informed of the slow operation and, if we ran out of time,
about how to obtain better accuracy.
a095ff54
flatfs.go 20 KB
Replace flatfs.go