1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-18 21:55:45 +02:00

114 Commits

Author SHA1 Message Date
Zygo Blaxell
6c8d2bf428 bees: limit FD cache size explicitly
This will allow the default size limit for cache objects to be changed
with impunity.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit 9daa51edaab44c02ce0917ff94b20683036d7594)
2017-06-17 10:15:08 -04:00
Zygo Blaxell
dc00dce842 context: purge FD cache every COMMIT_INTERVAL
Holding file FDs open for long periods of time delays inode destruction.
For very large files this can lead to excessive delays while bees dedups
data that will cease to be reachable.

Use the same workaround for file FDs (in the root_ino cache) that
is used for subvols (in the root cache):  forcibly close all cached
FDs at regular intervals.  The FD cache will reacquire FDs from files
that still have existing paths, and will abandon FDs from files that
no longer have existing paths.  The non-existing-path case is not new
(bees has always been able to discover deleted inodes) so it is already
handled by existing code.

Fixes: https://github.com/Zygo/bees/issues/18

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-02-08 22:01:00 -05:00
Zygo Blaxell
5713fcd770 bees: clean up statistics class
Some whitespace fixes.  Remove some duplicate code.  Don't lock
two BeesStats objects in the - operator method.

Get the locking for T& at(const K&) right to avoid locking a mutex
recursively.  Make the non-const version of the function private.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-22 22:00:28 -05:00
Zygo Blaxell
9f120e326b bees: fix deadlock in thread status reporting
"s_name" was a thread_local variable, not static, and did not require a
mutex to protect access.  A deadlock is possible if a thread triggers an
exception with a handler that attempts to log a message (as the top-level
exception handler in bees does).

Remove multiple unnecessary mutex locks.  Rename the thread_local variables
to make their scope clearer.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-15 01:55:34 -05:00
Zygo Blaxell
5e91529ad2 hash: remove the unused m_prefetch_rate_limit
The hash table statistics calculation in BeesHashTable::prefetch_loop
and the data-driven operation of the extent scanner always pulls the
hash table into RAM as fast as the disk will push the data.  We never
use the prefetch rate limit, so remove it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-11 21:15:12 -05:00
Zygo Blaxell
fa8607bae0 crucible: get rid of DefaultBool, just use C++11 initializer syntax
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 23:23:32 -05:00
Zygo Blaxell
1b261b1ba7 build: move BEES_VERSION to a separate C file to avoid unnecessary building
Every git commit was causing bees.cc and bees-hash.cc to be rebuilt,
which was expensive and unnecessary.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 23:23:05 -05:00
Zygo Blaxell
e8eaa7e471 trivial: mass purge of whitespace errors
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-06 22:14:50 -05:00
Zygo Blaxell
642581e89a hash: remove the experimental shared hash-table and shared mmap features
The experiments are over, and the results were not success.

Having two filesystems cohabiting in the same hash table results in a
lot of false positives, each of which requires some heavy IO to resolve.

Using MAP_SHARED to share a beeshash.dat between processes results in
catastrophically bad performance.

These features were abandoned long ago, but some of the code--and even
worse, its documentation--still remains.

Bees wants a hash table false positive rate below 0.1%.  With a shared
hash table the FP rate is about the same as the dedup rate.  Typically
duplicate files on one filesystem are duplicate on many filesystems.

One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation
produce extremely poor performance results.  A five-order-of-magnitude
speedup was achieved by implementing paging in userspace with worker
threads.  We no longer need the support code for the MAP_SHARED case.

It is still possible to run many BeesContexts in a single process,
but now the only thing contexts share is the FD cache.
2016-12-02 00:26:02 -05:00
Zygo Blaxell
fdfa78a81b context: default and relative BEESHOME
Allow relative paths with BEESHOME.  These paths will be relative
to the root of the dedup target filesystem.

BEESHOME is now optional.  If not specified, '.beeshome' is used.

We don't try to create BEESHOME if it doesn't exist.  BEESHOME might
not be on a btrfs filesystem, so we can't insist it be a subvol.
2016-12-02 00:22:18 -05:00
Zygo Blaxell
6fa8de660b hash: create beeshash.dat if it does not exist
BeesHashTable can now create a beeshash.dat if the file does not already
exist.  Currently the default size is one hash table extent (16MB) and
there's no way to change that (yet), so users should still create their
own hash tables for now.

The opening of the hash table is deferred (slightly) in preparation for
hash table resizing.

No doc as the feature is currently unfinished.
2016-12-02 00:20:30 -05:00
Zygo Blaxell
d58de9b76d bees: introduce BEESLOGNOTE macro
Quite often we have the same message in BEESLOG and BEESNOTE, so
make a macro to combine them.
2016-12-02 00:20:29 -05:00
Zygo Blaxell
06e111c229 crawl: remove UUID from file names
Unfortunately we don't get to remove the libuuid dependency because
we still want to read a file that exists in the legacy location.
2016-12-02 00:16:03 -05:00
Zygo Blaxell
cca0ee26a8 bees: remove local cruft, throw at github 2016-11-17 12:12:13 -05:00