• Hector Sanjuan's avatar
    Feat: Implement a PersistentDatastore by adding DiskUsage method (#27) · a095ff54
    Hector Sanjuan authored
    * Feat: Implement a PersistentDatastore by adding DiskUsage method
    
    This adds DiskUsage().
    
    This datastore would have a big performance hit if we walked the
    filesystem to calculate disk usage everytime.
    
    Therefore I have opted to keep tabs of current disk usage by
    walking the filesystem once during "Open" and then adding/subtracting
    file sizes on Put/Delete operations.
    
    On the plus:
      * Small perf impact
      * Always up to date values
      * No chance that race conditions will leave DiskUsage with wrong values
    
    On the minus:
      * Slower Open() - it run Stat() on all files in the datastore
      * Size does not match real size if a directory grows large
        (at least on ext4 systems). We don't track directory-size changes,
        only use the creation size.
    
    * Update .travis.yml: latest go
    
    * DiskUsage: cache diskUsage on Close()
    
    Avoids walking the whole datastore when a clean shutdown happened.
    
    File is removed on read, so a non-cleanly-shutdown datastore
    will not find an outdated file later.
    
    * Manage diskUsage with atomic.AddInt64 (no channel). Use tmp file + rename.
    
    * Remove redundant comments
    
    * Address race conditions when writing/deleting the same key concurrently
    
    This improves diskUsage book-keeping when writing and deleting the same
    key concurrently. It however means that existing values in the datastore
    cannot be replaced without a explicit delete (before put).
    
    A new test checks that there are no double counts in a put/delete race
    condition environment. This is true when sync is enabled. No syncing
    causes small over-counting when deleting files concurrently to put.
    
    * Document that datastore Put does not replace values
    
    * Comment TestPutOverwrite
    
    * Implement locking and discard for concurrent operations on the same key
    
    This implements the approach suggested by @stebalien in
    https://github.com/ipfs/go-ds-flatfs/pull/27
    
    Write operations (delete/put) to the same key are tracked in a map
    which provides a shared lock. Concurrent operations to that key
    will share that lock. If one operation succeeds, it will remove
    the lock from the map and the others using it will automatically
    succeed. If one operation fails, it will let the others waiting
    for the lock try.
    
    New operations to that key will request a new lock.
    
    A new test for putMany (batching) has been added.
    
    Worth noting: a concurrent Put+Delete on a non-existing key
    always yields Put as the winner (delete will fail if it comes first,
    or will skipped if it comes second).
    
    * Do less operation in tests (travis fails on mac)
    
    * Reduce counts again
    
    * DiskUsage: address comments. Use sync.Map.
    
    * Add rw and rwundo rules to Makefile
    
    * DiskUsage: use one-off locks for operations
    
    Per @stebalien 's suggestion.
    
    * DiskUsage: write checkpoint file when du changes by more than 1 percent
    
    Meaning, if the difference between the checkpoint file value and the current
    is more than one percent, we checkpoint it.
    
    * Fix tests so they ignore disk usage cache file
    
    * Rename: update disk usage when rename fails too..
    
    * Improve rename comment and be less explicit on field initialization
    
    * Do not use filepath.Walk, use Readdir instead.
    
    * Estimate diskUsage for folders with more than 100 files
    
    This will estimate disk usage when folders have more than
    100 files in them. Non processed files will be assumed to have
    the average size of processed ones.
    
    * Select file randomly when there are too many to read
    
    * Fix typo
    
    * fix tests
    
    * Set time deadline to 5 minutes.
    
    This provides a disk estimation deadline. We will stat() as many
    files as possible until we run out of time. If that happens,
    the rest will be calculated as an average.
    
    The user is informed of the slow operation and, if we ran out of time,
    about how to obtain better accuracy.
    a095ff54
flatfs.go 20 KB