mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
docs: config.md updates
The theories behind bees slowing down when presented with a larger has table turned out to be wrong. The real cause was a very old bug which submitted thousands of `LOGICAL_INO` requests when only a handful of requests were needed. "Compression on the filesystem" -> "Compression in files" Don't be so "dramatic". Be "rapid" instead. Remove "cannot avoid modifying read-only snapshots" as a distinction between subvol and extent scans. Both modes support send workaround and send waiting with no significant distinction. Emphasize extent scan's better handling of many snapshots. Also reflinks. Add some discussion of `--throttle-factor`. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
7fcde97b70
commit
ac581273d3
@ -26,11 +26,7 @@ Here are some numbers to estimate appropriate hash table sizes:
|
|||||||
Notes:
|
Notes:
|
||||||
|
|
||||||
* If the hash table is too large, no extra dedupe efficiency is
|
* If the hash table is too large, no extra dedupe efficiency is
|
||||||
obtained, and the extra space wastes RAM. If the hash table contains
|
obtained, and the extra space wastes RAM.
|
||||||
more block records than there are blocks in the filesystem, the extra
|
|
||||||
space can slow bees down. A table that is too large prevents obsolete
|
|
||||||
data from being evicted, so bees wastes time looking for matching data
|
|
||||||
that is no longer present on the filesystem.
|
|
||||||
|
|
||||||
* If the hash table is too small, bees extrapolates from matching
|
* If the hash table is too small, bees extrapolates from matching
|
||||||
blocks to find matching adjacent blocks in the filesystem that have been
|
blocks to find matching adjacent blocks in the filesystem that have been
|
||||||
@ -59,19 +55,19 @@ patterns on dedupe effectiveness without performing deep inspection of
|
|||||||
both the filesystem data and its structure--a task that is as expensive
|
both the filesystem data and its structure--a task that is as expensive
|
||||||
as performing the deduplication.
|
as performing the deduplication.
|
||||||
|
|
||||||
* **Compression** on the filesystem reduces the average extent length
|
* **Compression** in files reduces the average extent length compared
|
||||||
compared to uncompressed filesystems. The maximum compressed extent
|
to uncompressed files. The maximum compressed extent length on
|
||||||
length on btrfs is 128KB, while the maximum uncompressed extent length
|
btrfs is 128KB, while the maximum uncompressed extent length is 128MB.
|
||||||
is 128MB. Longer extents decrease the optimum hash table size while
|
Longer extents decrease the optimum hash table size while shorter extents
|
||||||
shorter extents increase the optimum hash table size because the
|
increase the optimum hash table size, because the probability of a hash
|
||||||
probability of a hash table entry being present (i.e. unevicted) in
|
table entry being present (i.e. unevicted) in each extent is proportional
|
||||||
each extent is proportional to the extent length.
|
to the extent length.
|
||||||
|
|
||||||
As a rule of thumb, the optimal hash table size for a compressed
|
As a rule of thumb, the optimal hash table size for a compressed
|
||||||
filesystem is 2-4x larger than the optimal hash table size for the same
|
filesystem is 2-4x larger than the optimal hash table size for the same
|
||||||
data on an uncompressed filesystem. Dedupe efficiency falls dramatically
|
data on an uncompressed filesystem. Dedupe efficiency falls rapidly with
|
||||||
with hash tables smaller than 128MB/TB as the average dedupe extent size
|
hash tables smaller than 128MB/TB as the average dedupe extent size is
|
||||||
is larger than the largest possible compressed extent size (128KB).
|
larger than the largest possible compressed extent size (128KB).
|
||||||
|
|
||||||
* **Short writes or fragmentation** also shorten the average extent
|
* **Short writes or fragmentation** also shorten the average extent
|
||||||
length and increase optimum hash table size. If a database writes to
|
length and increase optimum hash table size. If a database writes to
|
||||||
@ -115,7 +111,6 @@ Extent scan mode:
|
|||||||
* Works with 4.15 and later kernels.
|
* Works with 4.15 and later kernels.
|
||||||
* Can estimate progress and provide an ETA.
|
* Can estimate progress and provide an ETA.
|
||||||
* Can optimize scanning order to dedupe large extents first.
|
* Can optimize scanning order to dedupe large extents first.
|
||||||
* Cannot avoid modifying read-only subvols.
|
|
||||||
* Can keep up with frequent creation and deletion of snapshots.
|
* Can keep up with frequent creation and deletion of snapshots.
|
||||||
|
|
||||||
Subvol scan modes:
|
Subvol scan modes:
|
||||||
@ -123,8 +118,7 @@ Subvol scan modes:
|
|||||||
* Work with 4.14 and earlier kernels.
|
* Work with 4.14 and earlier kernels.
|
||||||
* Cannot estimate or report progress.
|
* Cannot estimate or report progress.
|
||||||
* Cannot optimize scanning order by extent size.
|
* Cannot optimize scanning order by extent size.
|
||||||
* Can avoid modifying read-only subvols (for `btrfs send` workaround).
|
* Have problems keeping up with multiple snapshots created during a scan.
|
||||||
* Have problems keeping up with snapshots created during a scan.
|
|
||||||
|
|
||||||
The default scan mode is 4, "extent".
|
The default scan mode is 4, "extent".
|
||||||
|
|
||||||
@ -212,7 +206,7 @@ Extent scan mode
|
|||||||
Scan mode 4, "extent", scans the extent tree instead of the subvol trees.
|
Scan mode 4, "extent", scans the extent tree instead of the subvol trees.
|
||||||
Extent scan mode reads each extent once, regardless of the number of
|
Extent scan mode reads each extent once, regardless of the number of
|
||||||
reflinks or snapshots. It adapts to the creation of new snapshots
|
reflinks or snapshots. It adapts to the creation of new snapshots
|
||||||
immediately, without having to revisit old data.
|
and reflinks immediately, without having to revisit old data.
|
||||||
|
|
||||||
In the extent scan mode, extents are separated into multiple size tiers
|
In the extent scan mode, extents are separated into multiple size tiers
|
||||||
to prioritize large extents over small ones. Deduping large extents
|
to prioritize large extents over small ones. Deduping large extents
|
||||||
@ -280,6 +274,15 @@ loads are active on the system, and resumes bees when the other loads
|
|||||||
are inactive. This is configured with the [`--loadavg-target` and
|
are inactive. This is configured with the [`--loadavg-target` and
|
||||||
`--thread-min` options](options.md).
|
`--thread-min` options](options.md).
|
||||||
|
|
||||||
|
bees can self-throttle operations that enqueue work within btrfs.
|
||||||
|
These operations are not well controlled by features such as process
|
||||||
|
priority or IO priority or ratelimiting, because the enqueued work
|
||||||
|
is submitted to btrfs several seconds before btrfs performs the work.
|
||||||
|
The [`--throttle-factor` option](options.md) tracks how long it takes
|
||||||
|
btrfs to complete queued operations, and reduces bees's submission
|
||||||
|
rate to match btrfs's completion rate (or a fraction thereof, to reduce
|
||||||
|
system load).
|
||||||
|
|
||||||
Log verbosity
|
Log verbosity
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user