1. 13 Jun, 2018 1 commit
  2. 09 Mar, 2018 1 commit
    • Hector Sanjuan's avatar
      Feat: Implement a PersistentDatastore by adding DiskUsage method (#27) · a095ff54
      Hector Sanjuan authored
      * Feat: Implement a PersistentDatastore by adding DiskUsage method
      
      This adds DiskUsage().
      
      This datastore would have a big performance hit if we walked the
      filesystem to calculate disk usage everytime.
      
      Therefore I have opted to keep tabs of current disk usage by
      walking the filesystem once during "Open" and then adding/subtracting
      file sizes on Put/Delete operations.
      
      On the plus:
        * Small perf impact
        * Always up to date values
        * No chance that race conditions will leave DiskUsage with wrong values
      
      On the minus:
        * Slower Open() - it run Stat() on all files in the datastore
        * Size does not match real size if a directory grows large
          (at least on ext4 systems). We don't track directory-size changes,
          only use the creation size.
      
      * Update .travis.yml: latest go
      
      * DiskUsage: cache diskUsage on Close()
      
      Avoids walking the whole datastore when a clean shutdown happened.
      
      File is removed on read, so a non-cleanly-shutdown datastore
      will not find an outdated file later.
      
      * Manage diskUsage with atomic.AddInt64 (no channel). Use tmp file + rename.
      
      * Remove redundant comments
      
      * Address race conditions when writing/deleting the same key concurrently
      
      This improves diskUsage book-keeping when writing and deleting the same
      key concurrently. It however means that existing values in the datastore
      cannot be replaced without a explicit delete (before put).
      
      A new test checks that there are no double counts in a put/delete race
      condition environment. This is true when sync is enabled. No syncing
      causes small over-counting when deleting files concurrently to put.
      
      * Document that datastore Put does not replace values
      
      * Comment TestPutOverwrite
      
      * Implement locking and discard for concurrent operations on the same key
      
      This implements the approach suggested by @stebalien in
      https://github.com/ipfs/go-ds-flatfs/pull/27
      
      Write operations (delete/put) to the same key are tracked in a map
      which provides a shared lock. Concurrent operations to that key
      will share that lock. If one operation succeeds, it will remove
      the lock from the map and the others using it will automatically
      succeed. If one operation fails, it will let the others waiting
      for the lock try.
      
      New operations to that key will request a new lock.
      
      A new test for putMany (batching) has been added.
      
      Worth noting: a concurrent Put+Delete on a non-existing key
      always yields Put as the winner (delete will fail if it comes first,
      or will skipped if it comes second).
      
      * Do less operation in tests (travis fails on mac)
      
      * Reduce counts again
      
      * DiskUsage: address comments. Use sync.Map.
      
      * Add rw and rwundo rules to Makefile
      
      * DiskUsage: use one-off locks for operations
      
      Per @stebalien 's suggestion.
      
      * DiskUsage: write checkpoint file when du changes by more than 1 percent
      
      Meaning, if the difference between the checkpoint file value and the current
      is more than one percent, we checkpoint it.
      
      * Fix tests so they ignore disk usage cache file
      
      * Rename: update disk usage when rename fails too..
      
      * Improve rename comment and be less explicit on field initialization
      
      * Do not use filepath.Walk, use Readdir instead.
      
      * Estimate diskUsage for folders with more than 100 files
      
      This will estimate disk usage when folders have more than
      100 files in them. Non processed files will be assumed to have
      the average size of processed ones.
      
      * Select file randomly when there are too many to read
      
      * Fix typo
      
      * fix tests
      
      * Set time deadline to 5 minutes.
      
      This provides a disk estimation deadline. We will stat() as many
      files as possible until we run out of time. If that happens,
      the rest will be calculated as an average.
      
      The user is informed of the slow operation and, if we ran out of time,
      about how to obtain better accuracy.
      a095ff54
  3. 25 Aug, 2016 1 commit