mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 13:25:45 +02:00
README.md: answer some questions that came in after release
This commit is contained in:
parent
74de78947d
commit
876b76d761
44
README.md
44
README.md
@ -7,19 +7,18 @@ About Bees
|
||||
----------
|
||||
|
||||
Bees is a daemon designed to run continuously on live file servers.
|
||||
Bees consumes entire filesystems and deduplicates in a single pass, using
|
||||
minimal RAM to store data. Bees maintains persistent state so it can be
|
||||
interrupted and resumed, whether by planned upgrades or unplanned crashes.
|
||||
Bees makes continuous incremental progress instead of using separate
|
||||
scan and dedup phases. Bees uses the Linux kernel's `dedupe_file_range`
|
||||
system call to ensure data is handled safely even if other applications
|
||||
concurrently modify it.
|
||||
Bees scans and deduplicates whole filesystems in a single pass instead
|
||||
of separate scan and dedup phases. RAM usage does _not_ depend on
|
||||
unique data size or the number of input files. Hash tables and scan
|
||||
progress are stored persistently so the daemon can resume after a reboot.
|
||||
Bees uses the Linux kernel's `dedupe_file_range` feature to ensure data
|
||||
is handled safely even if other applications concurrently modify it.
|
||||
|
||||
Bees is intentionally btrfs-specific for performance and capability.
|
||||
Bees uses the btrfs `SEARCH_V2` ioctl to scan for new data
|
||||
without the overhead of repeatedly walking filesystem trees with the
|
||||
POSIX API. Bees uses `LOGICAL_INO` and `INO_PATHS` to leverage btrfs's
|
||||
existing metadata instead of building its own redundant data structures.
|
||||
Bees uses the btrfs `SEARCH_V2` ioctl to scan for new data without the
|
||||
overhead of repeatedly walking filesystem trees with the POSIX API.
|
||||
Bees uses `LOGICAL_INO` and `INO_PATHS` to leverage btrfs's existing
|
||||
metadata instead of building its own redundant data structures.
|
||||
Bees can cope with Btrfs filesystem compression. Bees can reassemble
|
||||
Btrfs extents to deduplicate extents that contain a mix of duplicate
|
||||
and unique data blocks.
|
||||
@ -37,7 +36,8 @@ using a weighted sampling algorithm. This allows Bees to adapt itself
|
||||
to its filesystem size without forcing admins to do math at install time.
|
||||
At the same time, the duplicate block alignment constraint can be as low
|
||||
as 4K, allowing efficient deduplication of files with narrowly-aligned
|
||||
duplicate block offsets (e.g. compiled binaries and VM/disk images).
|
||||
duplicate block offsets (e.g. compiled binaries and VM/disk images)
|
||||
even if the effective block size is much larger.
|
||||
|
||||
The Bees hash table is loaded into RAM at startup (using hugepages if
|
||||
available), mlocked, and synced to persistent storage by trickle-writing
|
||||
@ -78,6 +78,12 @@ and some metadata bits). Each entry represents a minimum of 4K on disk.
|
||||
1TB 16MB 1024K
|
||||
64TB 1GB 1024K
|
||||
|
||||
It is possible to resize the hash table by changing the size of
|
||||
`beeshash.dat` (e.g. with `truncate`) and restarting `bees`. This
|
||||
does not preserve all the existing hash table entries, but it does
|
||||
preserve more than zero of them--especially if the old and new sizes
|
||||
are a power-of-two multiple of each other.
|
||||
|
||||
Things You Might Expect That Bees Doesn't Have
|
||||
----------------------------------------------
|
||||
|
||||
@ -113,6 +119,16 @@ this was removed because it made Bees too aggressive to coexist with
|
||||
other applications on the same machine. It also hit the *slow backrefs*
|
||||
on N CPU cores instead of just one.
|
||||
|
||||
* Block reads are currently more allocation- and CPU-intensive than they
|
||||
should be, especially for filesystems on SSD where the IO overhead is
|
||||
much smaller. This is a problem for power-constrained environments
|
||||
(e.g. laptops with slow CPU).
|
||||
|
||||
* Bees can currently fragment extents when required to remove duplicate
|
||||
blocks, but has no defragmentation capability yet. When possible, Bees
|
||||
will attempt to work with existing extent boundaries, but it will not
|
||||
aggregate blocks together from multiple extents to create larger ones.
|
||||
|
||||
Good Btrfs Feature Interactions
|
||||
-------------------------------
|
||||
|
||||
@ -340,9 +356,9 @@ Use a bind mount, and let only bees access it:
|
||||
|
||||
Reduce CPU and IO priority to be kinder to other applications
|
||||
sharing this host (or raise them for more aggressive disk space
|
||||
recovery). If you use cgroups, put bees in its own cgroup, then reduce
|
||||
recovery). If you use cgroups, put `bees` in its own cgroup, then reduce
|
||||
the `blkio.weight` and `cpu.shares` parameters. You can also use
|
||||
`schedtool` and `ionice in the shell script that launches bees:
|
||||
`schedtool` and `ionice` in the shell script that launches `bees`:
|
||||
|
||||
schedtool -D -n20 $$
|
||||
ionice -c3 -p $$
|
||||
|
Loading…
x
Reference in New Issue
Block a user