1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 13:25:45 +02:00

BEES

Best-Effort Extent-Same, a btrfs deduplicator.

TODO

Write some docs here:

  • copyright (Zygo Blaxell 2015-2016), license (GPL3+)
  • what it is
  • what it isn't
  • building it
  • what works
  • what doesn't work
  • a brief history of btrfs kernel bugs
  • things that could have been, and why they aren't
  • roadmap (and anti-roadmap)
  • how to report bugs
  • how to contribute

Build

Requirements: * C++11 compiler (I use GCC 4.9) * btrfs-progs (I've used 4.1..4.7) for /usr/include/btrfs/* * libuuid-dev (TODO: remove the one function we call from this library)

Build with make.

The build produces bin/bees and lib/libcrucible.so, which must be copied to somewhere in $PATH and $LD_LIBRARY_PATH on the target system respectively.

Setup

Create a directory for bees state files:

export BEESHOME=/some/path
mkdir -p "$BEESHOME"

Create an empty hash table (your choice of size, but it must be a multiple of 16M). This example creates a 1GB hash table:

truncate -s 1g "$BEESHOME/beeshash.dat"
chmod 700 "$BEESHOME/beeshash.dat"

Configuration

The only runtime configurable options are environment variables:

  • BEESHOME: Directory containing Bees state files:
  • beeshash.dat | persistent hash table (must be a multiple of 16M)
  • beescrawl.UUID.dat | state of SEARCH_V2 crawlers
  • beesstats.txt | statistics and performance counters
  • BEESSTATS: File containing a snapshot of current Bees state (performance counters and current status of each thread).

Other options (e.g. interval between filesystem crawls) can be configured in src/bees.h.

Running

We created this directory in the previous section.

export BEESHOME=/some/path

Use a tmpfs for BEESSTATUS, it updates once per second

export BEESSTATUS=/run/bees.status

bees can only process the root subvol of a btrfs. Use a bind mount, and let only bees access it.

mount -osubvol=/ /dev/<your-filesystem> /var/lib/bees/root

Let the bees fly!

bees /var/lib/bees/root >> /var/log/bees.log 2>&1

You'll probably want to arrange for /var/log/bees.log to be rotated periodically. You may also want to set umask to 077 to prevent disclosure of information about the contents of the filesystem through the log file.

Description
Best-Effort Extent-Same, a btrfs dedupe agent
Readme 1.7 MiB
Languages
C++ 97%
C 1.6%
Makefile 0.8%
Shell 0.6%