1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

docs: minor changes to how-it-works based on past user questions

Clarify that "too large" and "too small" are some distance away from each other.
The Goldilocks zone is _wide_.

The interval between cache drops is now shorter.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell 2023-02-25 03:12:15 -05:00
parent c92ba117d8
commit 73834beb5a

View File

@ -8,10 +8,12 @@ bees uses checkpoints for persistence to eliminate the IO overhead of a
transactional data store. On restart, bees will dedupe any data that
was added to the filesystem since the last checkpoint. Checkpoints
occur every 15 minutes for scan progress, stored in `beescrawl.dat`.
The hash table trickle-writes to disk at 4GB/hour to `beeshash.dat`.
An hourly performance report is written to `beesstats.txt`. There are
no special requirements for bees hash table storage--`.beeshome` could
be stored on a different btrfs filesystem, ext4, or even CIFS.
The hash table trickle-writes to disk at 128KiB/s to `beeshash.dat`,
but will flush immediately if bees is terminated by SIGTERM.
There are no special requirements for bees hash table storage--`.beeshome`
could be stored on a different btrfs filesystem, ext4, or even CIFS (but
not MS-DOS--beeshome does need filenames longer than 8.3).
bees uses a persistent dedupe hash table with a fixed size configured
by the user. Any size of hash table can be dedicated to dedupe. If a
@ -20,7 +22,7 @@ small as 128KB.
The bees hash table is loaded into RAM at startup and `mlock`ed so it
will not be swapped out by the kernel (if swap is permitted, performance
degrades to nearly zero).
degrades to nearly zero, for both bees and the swap device).
bees scans the filesystem in a single pass which removes duplicate
extents immediately after they are detected. There are no distinct
@ -83,12 +85,12 @@ of these functions in userspace, at the expense of encountering [some
kernel bugs in `LOGICAL_INO` performance](btrfs-kernel.md).
bees uses only the data-safe `FILE_EXTENT_SAME` (aka `FIDEDUPERANGE`)
kernel operations to manipulate user data, so it can dedupe live data
(e.g. build servers, sqlite databases, VM disk images). It does not
modify file attributes or timestamps.
kernel ioctl to manipulate user data, so it can dedupe live data
(e.g. build servers, sqlite databases, VM disk images). bees does not
modify file attributes or timestamps in deduplicated files.
When bees has scanned all of the data, bees will pause until 10
transactions have been completed in the btrfs filesystem. bees tracks
When bees has scanned all of the data, bees will pause until a new
transaction has completed in the btrfs filesystem. bees tracks
the current btrfs transaction ID over time so that it polls less often
on quiescent filesystems and more often on busy filesystems.