1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-18 21:55:45 +02:00

9 Commits

Author SHA1 Message Date
Zygo Blaxell
3fdc217b4f bees: change formatting for physical bytenr ranges in dedup
Use a different character to make it easier to search for bytenr ranges
in the logs.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit d43199e3d6e6469264eb10de8b0a783f8573e0e8)
2017-06-17 10:15:08 -04:00
Zygo Blaxell
6c8d2bf428 bees: limit FD cache size explicitly
This will allow the default size limit for cache objects to be changed
with impunity.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit 9daa51edaab44c02ce0917ff94b20683036d7594)
2017-06-17 10:15:08 -04:00
Zygo Blaxell
dc00dce842 context: purge FD cache every COMMIT_INTERVAL
Holding file FDs open for long periods of time delays inode destruction.
For very large files this can lead to excessive delays while bees dedups
data that will cease to be reachable.

Use the same workaround for file FDs (in the root_ino cache) that
is used for subvols (in the root cache):  forcibly close all cached
FDs at regular intervals.  The FD cache will reacquire FDs from files
that still have existing paths, and will abandon FDs from files that
no longer have existing paths.  The non-existing-path case is not new
(bees has always been able to discover deleted inodes) so it is already
handled by existing code.

Fixes: https://github.com/Zygo/bees/issues/18

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-02-08 22:01:00 -05:00
Zygo Blaxell
db8ea92133 bees: fix further instances of copy-after-unlock bug
Before:

        unique_lock<mutex> lock(some_mutex);
        // run lock.~unique_lock() because return
        // return reference to unprotected heap
        return foo[bar];

After:

        unique_lock<mutex> lock(some_mutex);
        // make copy of object on heap protected by mutex lock
        auto tmp_copy = foo[bar];
        // run lock.~unique_lock() because return
        // pass locally allocated object to copy constructor
        return tmp_copy;

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-22 22:00:27 -05:00
Zygo Blaxell
eec80944cd roots: add a counter for crawl_ms, open_root and open_root_ino
Linux kernel commit 7f8e406 ("btrfs: improve delayed refs iterations")
seems to dramatically improve LOGICAL_INO performance.  Hopefully this
commit will find its way into mainline Linux soon.

This means that most of the time in Bees is now spent on block reading
(50-75%); however, there is still a big gap between block read and
the sum of everything else we are measuring with the "*_ms" counters.
This gap is about 30% of the run time, so it would be good to find out
what's in the gap.

Add ms counters around the crawl and open calls to capture where we are
spending all the time.
2016-12-08 23:55:39 -05:00
Zygo Blaxell
642581e89a hash: remove the experimental shared hash-table and shared mmap features
The experiments are over, and the results were not success.

Having two filesystems cohabiting in the same hash table results in a
lot of false positives, each of which requires some heavy IO to resolve.

Using MAP_SHARED to share a beeshash.dat between processes results in
catastrophically bad performance.

These features were abandoned long ago, but some of the code--and even
worse, its documentation--still remains.

Bees wants a hash table false positive rate below 0.1%.  With a shared
hash table the FP rate is about the same as the dedup rate.  Typically
duplicate files on one filesystem are duplicate on many filesystems.

One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation
produce extremely poor performance results.  A five-order-of-magnitude
speedup was achieved by implementing paging in userspace with worker
threads.  We no longer need the support code for the MAP_SHARED case.

It is still possible to run many BeesContexts in a single process,
but now the only thing contexts share is the FD cache.
2016-12-02 00:26:02 -05:00
Zygo Blaxell
fdfa78a81b context: default and relative BEESHOME
Allow relative paths with BEESHOME.  These paths will be relative
to the root of the dedup target filesystem.

BEESHOME is now optional.  If not specified, '.beeshome' is used.

We don't try to create BEESHOME if it doesn't exist.  BEESHOME might
not be on a btrfs filesystem, so we can't insist it be a subvol.
2016-12-02 00:22:18 -05:00
Zygo Blaxell
1303fb9da8 build: fix FTBFS on GCC 6.2
I'm not surprised that GCC 6 doesn't let me send an ostream ref to itself,
even inside an uninstantiated template specialization.  I am a little
surprised I was trying to, and 4.9 let me get away with it.

It's 2016.  auto_ptr is deprecated now.

Some things were including vector that don't any more.

https://github.com/Zygo/bees/issues/1
2016-11-24 22:20:11 -05:00
Zygo Blaxell
cca0ee26a8 bees: remove local cruft, throw at github 2016-11-17 12:12:13 -05:00