1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00
bees/docs/running.md
Zygo Blaxell 924008603e hash: reduce hash table extent size to 128KB
The 16MB hash table extent size did not serve any useful defragmentation
or compression purpose, and for very small filesystems (under 100GB),
16MB is much larger than necessary.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-10-19 20:21:04 -04:00

93 lines
3.4 KiB
Markdown

Running bees
============
Setup
-----
If you don't want to use the helper script `scripts/beesd` to setup and
configure bees, here's how you manually setup bees.
Create a directory for bees state files:
export BEESHOME=/some/path
mkdir -p "$BEESHOME"
Create an empty hash table ([your choice of size](config.md), but it
must be a multiple of 128KB). This example creates a 1GB hash table:
truncate -s 1g "$BEESHOME/beeshash.dat"
chmod 700 "$BEESHOME/beeshash.dat"
bees can _only_ process the root subvol of a btrfs with nothing mounted
over top. If the bees argument is not the root subvol directory, bees
will just throw an exception and stop.
Use a separate mount point, and let only bees access it:
UUID=3399e413-695a-4b0b-9384-1b0ef8f6c4cd
mkdir -p /var/lib/bees/$UUID
mount /dev/disk/by-uuid/$UUID /var/lib/bees/$UUID -osubvol=/
If you don't set BEESHOME, the path "`.beeshome`" will be used relative
to the root subvol of the filesystem. For example:
btrfs sub create /var/lib/bees/$UUID/.beeshome
truncate -s 1g /var/lib/bees/$UUID/.beeshome/beeshash.dat
chmod 700 /var/lib/bees/$UUID/.beeshome/beeshash.dat
You can use any relative path in `BEESHOME`. The path will be taken
relative to the root of the deduped filesystem (in other words it can
be the name of a subvol):
export BEESHOME=@my-beeshome
btrfs sub create /var/lib/bees/$UUID/$BEESHOME
truncate -s 1g /var/lib/bees/$UUID/$BEESHOME/beeshash.dat
chmod 700 /var/lib/bees/$UUID/$BEESHOME/beeshash.dat
Configuration
-------------
There are some runtime configurable options using environment variables:
* BEESHOME: Directory containing bees state files:
* beeshash.dat | persistent hash table. Must be a multiple of 128KB, and must be created before bees starts.
* beescrawl.dat | state of SEARCH_V2 crawlers. ASCII text. bees will create this.
* beesstats.txt | statistics and performance counters. ASCII text. bees will create this.
* BEESSTATUS: File containing a snapshot of current bees state: performance
counters and current status of each thread. The file is meant to be
human readable, but understanding it probably requires reading the source.
You can watch bees run in realtime with a command like:
watch -n1 cat $BEESSTATUS
Other options (e.g. interval between filesystem crawls) can be configured
in `src/bees.h` or [on the command line](options.md).
Running
-------
Reduce CPU and IO priority to be kinder to other applications sharing
this host (or raise them for more aggressive disk space recovery). If you
use cgroups, put `bees` in its own cgroup, then reduce the `blkio.weight`
and `cpu.shares` parameters. You can also use `schedtool` and `ionice`
in the shell script that launches `bees`:
schedtool -D -n20 $$
ionice -c3 -p $$
You can also use the [`--loadavg-target` and `--thread-min`
options](options.md) to further control the impact of bees on the rest
of the system.
Let the bees fly:
for fs in /var/lib/bees/*-*-*-*-*/; do
bees "$fs" >> "$fs/.beeshome/bees.log" 2>&1 &
done
You'll probably want to arrange for `/var/log/bees.log` to be rotated
periodically. You may also want to set umask to 077 to prevent disclosure
of information about the contents of the filesystem through the log file.
There are also some shell wrappers in the `scripts/` directory.