1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

docs: old missing features are not missing any more

The extent scan mode has been implemented (partially, but close enough
to win benchmarks).

New features include several nuisance dedupe countermeasures.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell 2023-02-25 03:13:23 -05:00
parent 25f7ced27b
commit d5a6c30623

View File

@ -15,16 +15,9 @@ specific files (patches welcome).
* PREALLOC extents and extents containing blocks filled with zeros will * PREALLOC extents and extents containing blocks filled with zeros will
be replaced by holes. There is no way to turn this off. be replaced by holes. There is no way to turn this off.
* Consecutive runs of duplicate blocks that are less than 12K in length * The fundamental unit of deduplication is the extent _reference_, when
can take 30% of the processing time while saving only 3% of the disk it should be the _extent_ itself. This is an architectural limitation
space. There should be an option to just not bother with those, but it's that results in excess reads of extent data, even in the Extent scan mode.
complicated by the btrfs requirement to always dedupe complete extents.
* There is a lot of duplicate reading of blocks in snapshots. bees will
scan all snapshots at close to the same time to try to get better
performance by caching, but really fixing this requires rewriting the
crawler to scan the btrfs extent tree directly instead of the subvol
FS trees.
* Block reads are currently more allocation- and CPU-intensive than they * Block reads are currently more allocation- and CPU-intensive than they
should be, especially for filesystems on SSD where the IO overhead is should be, especially for filesystems on SSD where the IO overhead is
@ -33,8 +26,9 @@ much smaller. This is a problem for CPU-power-constrained environments
* bees can currently fragment extents when required to remove duplicate * bees can currently fragment extents when required to remove duplicate
blocks, but has no defragmentation capability yet. When possible, bees blocks, but has no defragmentation capability yet. When possible, bees
will attempt to work with existing extent boundaries, but it will not will attempt to work with existing extent boundaries and choose the
aggregate blocks together from multiple extents to create larger ones. largest fragments available, but it will not aggregate blocks together
from multiple extents to create larger ones.
* When bees fragments an extent, the copied data is compressed. There * When bees fragments an extent, the copied data is compressed. There
is currently no way (other than by modifying the source) to select a is currently no way (other than by modifying the source) to select a