diff --git a/README.md b/README.md index a82cdce..ddaea7d 100644 --- a/README.md +++ b/README.md @@ -6,30 +6,30 @@ Best-Effort Extent-Same, a btrfs deduplication agent. About bees ---------- -bees is a block-oriented userspace deduplication agent designed for large -btrfs filesystems. It is an offline dedupe combined with an incremental -data scan capability to minimize time data spends on disk from write -to dedupe. +bees is a block-oriented userspace deduplication agent designed to scale +up to large btrfs filesystems. It is an offline dedupe combined with +an incremental data scan capability to minimize time data spends on disk +from write to dedupe. Strengths --------- - * Space-efficient hash table and matching algorithms - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB) - * Daemon incrementally dedupes new data using btrfs tree search + * Space-efficient hash table - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB) + * Daemon mode - incrementally dedupes new data as it appears + * Largest extents first - recover more free space during fixed maintenance windows * Works with btrfs compression - dedupe any combination of compressed and uncompressed files - * Works around btrfs filesystem structure to free more disk space + * Whole-filesystem dedupe - scans data only once, even with snapshots and reflinks * Persistent hash table for rapid restart after shutdown - * Whole-filesystem dedupe - including snapshots * Constant hash table size - no increased RAM usage if data set becomes larger * Works on live data - no scheduled downtime required - * Automatic self-throttling based on system load + * Automatic self-throttling - reduces system load + * btrfs support - recovers more free space from btrfs than naive dedupers Weaknesses ---------- * Whole-filesystem dedupe - has no include/exclude filters, does not accept file lists - * Requires root privilege (or `CAP_SYS_ADMIN`) - * First run may require temporary disk space for extent reorganization + * Requires root privilege (`CAP_SYS_ADMIN` plus the usual filesystem read/modify caps) * [First run may increase metadata space usage if many snapshots exist](docs/gotchas.md) * Constant hash table size - no decreased RAM usage if data set becomes smaller * btrfs only @@ -46,7 +46,7 @@ Recommended Reading ------------------- * [bees Gotchas](docs/gotchas.md) - * [btrfs kernel bugs](docs/btrfs-kernel.md) - especially DATA CORRUPTION WARNING + * [btrfs kernel bugs](docs/btrfs-kernel.md) - especially DATA CORRUPTION WARNING for old kernels * [bees vs. other btrfs features](docs/btrfs-other.md) * [What to do when something goes wrong](docs/wrong.md) @@ -69,6 +69,6 @@ You can also use Github: Copyright & License ------------------- -Copyright 2015-2023 Zygo Blaxell . +Copyright 2015-2025 Zygo Blaxell . GPL (version 3 or later). diff --git a/docs/index.md b/docs/index.md index 4ce7579..e97a0f3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,30 +6,30 @@ Best-Effort Extent-Same, a btrfs deduplication agent. About bees ---------- -bees is a block-oriented userspace deduplication agent designed for large -btrfs filesystems. It is an offline dedupe combined with an incremental -data scan capability to minimize time data spends on disk from write -to dedupe. +bees is a block-oriented userspace deduplication agent designed to scale +up to large btrfs filesystems. It is an offline dedupe combined with +an incremental data scan capability to minimize time data spends on disk +from write to dedupe. Strengths --------- - * Space-efficient hash table and matching algorithms - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB) - * Daemon incrementally dedupes new data using btrfs tree search + * Space-efficient hash table - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB) + * Daemon mode - incrementally dedupes new data as it appears + * Largest extents first - recover more free space during fixed maintenance windows * Works with btrfs compression - dedupe any combination of compressed and uncompressed files - * Works around btrfs filesystem structure to free more disk space + * Whole-filesystem dedupe - scans data only once, even with snapshots and reflinks * Persistent hash table for rapid restart after shutdown - * Whole-filesystem dedupe - including snapshots * Constant hash table size - no increased RAM usage if data set becomes larger * Works on live data - no scheduled downtime required - * Automatic self-throttling based on system load + * Automatic self-throttling - reduces system load + * btrfs support - recovers more free space from btrfs than naive dedupers Weaknesses ---------- * Whole-filesystem dedupe - has no include/exclude filters, does not accept file lists - * Requires root privilege (or `CAP_SYS_ADMIN`) - * First run may require temporary disk space for extent reorganization + * Requires root privilege (`CAP_SYS_ADMIN` plus the usual filesystem read/modify caps) * [First run may increase metadata space usage if many snapshots exist](gotchas.md) * Constant hash table size - no decreased RAM usage if data set becomes smaller * btrfs only @@ -46,7 +46,7 @@ Recommended Reading ------------------- * [bees Gotchas](gotchas.md) - * [btrfs kernel bugs](btrfs-kernel.md) - especially DATA CORRUPTION WARNING + * [btrfs kernel bugs](btrfs-kernel.md) - especially DATA CORRUPTION WARNING for old kernels * [bees vs. other btrfs features](btrfs-other.md) * [What to do when something goes wrong](wrong.md) @@ -69,6 +69,6 @@ You can also use Github: Copyright & License ------------------- -Copyright 2015-2023 Zygo Blaxell . +Copyright 2015-2025 Zygo Blaxell . GPL (version 3 or later).