bees/docs/running.md

Running bees
============

Setup
-----

If you don't want to use the helper script `scripts/beesd` to setup and
configure bees, here's how you manually setup bees.

Create a directory for bees state files:

        export BEESHOME=/some/path
        mkdir -p "$BEESHOME"

Create an empty hash table ([your choice of size](config.md), but it
must be a multiple of 128KB).  This example creates a 1GB hash table:

        truncate -s 1g "$BEESHOME/beeshash.dat"
        chmod 700 "$BEESHOME/beeshash.dat"

bees can _only_ process the root subvol of a btrfs with nothing mounted
over top.  If the bees argument is not the root subvol directory, bees
will just throw an exception and stop.

Use a separate mount point, and let only bees access it:

        UUID=3399e413-695a-4b0b-9384-1b0ef8f6c4cd
        mkdir -p /var/lib/bees/$UUID
        mount /dev/disk/by-uuid/$UUID /var/lib/bees/$UUID -osubvol=/

If you don't set BEESHOME, the path "`.beeshome`" will be used relative
to the root subvol of the filesystem.  For example:

        btrfs sub create /var/lib/bees/$UUID/.beeshome
        truncate -s 1g /var/lib/bees/$UUID/.beeshome/beeshash.dat
        chmod 700 /var/lib/bees/$UUID/.beeshome/beeshash.dat

You can use any relative path in `BEESHOME`.  The path will be taken
relative to the root of the deduped filesystem (in other words it can
be the name of a subvol):

        export BEESHOME=@my-beeshome
        btrfs sub create /var/lib/bees/$UUID/$BEESHOME
        truncate -s 1g /var/lib/bees/$UUID/$BEESHOME/beeshash.dat
        chmod 700 /var/lib/bees/$UUID/$BEESHOME/beeshash.dat

Configuration
-------------

There are some runtime configurable options using environment variables:

* BEESHOME: Directory containing bees state files:
	* beeshash.dat  | persistent hash table.  Must be a multiple of 128KB, and must be created before bees starts.
	* beescrawl.dat | state of SEARCH_V2 crawlers.  ASCII text.  bees will create this.
	* beesstats.txt | statistics and performance counters.  ASCII text.  bees will create this.
* BEESSTATUS: File containing a snapshot of current bees state:  performance
  counters and current status of each thread.  The file is meant to be
  human readable, but understanding it probably requires reading the source.
  You can watch bees run in realtime with a command like:

        watch -n1 cat $BEESSTATUS

Other options (e.g. interval between filesystem crawls) can be configured
in `src/bees.h` or [on the command line](options.md).

Running
-------

Reduce CPU and IO priority to be kinder to other applications sharing
this host (or raise them for more aggressive disk space recovery).  If you
use cgroups, put `bees` in its own cgroup, then reduce the `blkio.weight`
and `cpu.shares` parameters.  You can also use `schedtool` and `ionice`
in the shell script that launches `bees`:

        schedtool -D -n20 $$
        ionice -c3 -p $$

You can also use the [`--loadavg-target` and `--thread-min`
options](options.md) to further control the impact of bees on the rest
of the system.

Let the bees fly:

        for fs in /var/lib/bees/*-*-*-*-*/; do
                bees "$fs" >> "$fs/.beeshome/bees.log" 2>&1 &
        done

You'll probably want to arrange for `/var/log/bees.log` to be rotated
periodically.  You may also want to set umask to 077 to prevent disclosure
of information about the contents of the filesystem through the log file.

There are also some shell wrappers in the `scripts/` directory.