mirror of
https://github.com/Zygo/bees.git
synced 2025-05-18 05:45:45 +02:00
docs: update documentation for new 'recent' scan mode
Also attempted to clarify the descriptions of the modes based on feedback and questions from users over the years. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
03f809bf22
commit
984ceeb2a5
@ -94,38 +94,76 @@ every time a new client machine's data is added to the server.
|
|||||||
Scanning modes for multiple subvols
|
Scanning modes for multiple subvols
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
|
|
||||||
The `--scan-mode` option affects how bees divides resources between
|
The `--scan-mode` option affects how bees schedules worker threads
|
||||||
subvolumes. This is particularly relevant when there are snapshots,
|
between subvolumes. Scan modes are an experimental feature and will
|
||||||
as there are tradeoffs to be made depending on how snapshots are used
|
likely be deprecated in favor of a better solution.
|
||||||
on the filesystem.
|
|
||||||
|
|
||||||
Note that if a filesystem has only one subvolume (i.e. the root,
|
Scan mode can be changed at any time by restarting bees with a different
|
||||||
subvol ID 5) then the `--scan-mode` option has no effect, as there is
|
mode option. Scan state tracking is the same for all of the currently
|
||||||
only one subvolume to scan.
|
implemented modes. The difference between the modes is the order in
|
||||||
|
which subvols are selected.
|
||||||
|
|
||||||
The default mode is mode 0, "lockstep". In this mode, each inode of each
|
If a filesystem has only one subvolume with data in it, then the
|
||||||
subvol is scanned at the same time, before moving to the next inode in
|
`--scan-mode` option has no effect. In this case, there is only one
|
||||||
each subvol. This maximizes the likelihood that all of the references to
|
subvolume to scan, so worker threads will all scan that one.
|
||||||
a snapshot of a file are scanned at the same time, which takes advantage
|
|
||||||
of VFS caching in the Linux kernel. If snapshots are created very often,
|
|
||||||
bees will not make very good progress as it constantly restarts the
|
|
||||||
filesystem scan from the beginning each time a new snapshot is created.
|
|
||||||
|
|
||||||
Scan mode 1, "independent", simply scans every subvol independently
|
Within a subvol, there is a single optimal scan order: files are scanned
|
||||||
in parallel. Each subvol's scanner shares time equally with all other
|
in ascending numerical inode order. Each worker will scan a different
|
||||||
subvol scanners. Whenever a new subvol appears, a new scanner is
|
inode to avoid having the threads contend with each other for locks.
|
||||||
created and the new subvol scanner doesn't affect the behavior of any
|
File data is read sequentially and in order, but old blocks from earlier
|
||||||
existing subvol scanner.
|
scans are skipped.
|
||||||
|
|
||||||
Scan mode 2, "sequential", processes each subvol completely before
|
Between subvols, there are several scheduling algorithms with different
|
||||||
proceeding to the next subvol. This is a good mode when using bees for
|
trade-offs:
|
||||||
the first time on a filesystem that already has many existing snapshots
|
|
||||||
and a high rate of new snapshot creation. Short-lived snapshots
|
Scan mode 0, "lockstep", scans the same inode number in each subvol at
|
||||||
(e.g. those used for `btrfs send`) are effectively ignored, and bees
|
close to the same time. This is useful if the subvols are snapshots
|
||||||
directs its efforts toward older subvols that are more likely to be
|
with a common ancestor, since the same inode number in each subvol will
|
||||||
origin subvols for snapshots. By deduping origin subvols first, bees
|
have similar or identical contents. This maximizes the likelihood
|
||||||
ensures that future snapshots will already be deduplicated and do not
|
that all of the references to a snapshot of a file are scanned at
|
||||||
need to be deduplicated again.
|
close to the same time, improving dedupe hit rate and possibly taking
|
||||||
|
advantage of VFS caching in the Linux kernel. If the subvols are
|
||||||
|
unrelated (i.e. not snapshots of a single subvol) then this mode does
|
||||||
|
not provide significant benefit over random selection. This mode uses
|
||||||
|
smaller amounts of temporary space for shorter periods of time when most
|
||||||
|
subvols are snapshots. When a new snapshot is created, this mode will
|
||||||
|
stop scanning other subvols and scan the new snapshot until the same
|
||||||
|
inode number is reached in each subvol, which will effectively stop
|
||||||
|
dedupe temporarily as this data has already been scanned and deduped
|
||||||
|
in the other snapshots.
|
||||||
|
|
||||||
|
Scan mode 1, "independent", scans the next inode with new data in each
|
||||||
|
subvol. Each subvol's scanner shares inodes uniformly with all other
|
||||||
|
subvol scanners until the subvol has no new inodes left. This mode makes
|
||||||
|
continuous forward progress across the filesystem and provides average
|
||||||
|
performance across a variety of workloads, but is slow to respond to new
|
||||||
|
data, and may spend a lot of time deduping short-lived subvols that will
|
||||||
|
soon be deleted when it is preferable to dedupe long-lived subvols that
|
||||||
|
will be the origin of future snapshots. When a new snapshot is created,
|
||||||
|
previous subvol scans continue as before, but the time is now divided
|
||||||
|
among one more subvol.
|
||||||
|
|
||||||
|
Scan mode 2, "sequential", scans one subvol at a time, in numerical subvol
|
||||||
|
ID order, processing each subvol completely before proceeding to the
|
||||||
|
next subvol. This avoids spending time scanning short-lived snapshots
|
||||||
|
that will be deleted before they can be fully deduped (e.g. those used
|
||||||
|
for `btrfs send`). Scanning is concentrated on older subvols that are
|
||||||
|
more likely to be origin subvols for future snapshots, eliminating the
|
||||||
|
need to dedupe future snapshots separately. This mode uses the largest
|
||||||
|
amount of temporary space for the longest time, and typically requires
|
||||||
|
a larger hash table to maintain dedupe hit rate.
|
||||||
|
|
||||||
|
Scan mode 3, "recent", scans the subvols with the highest `min_transid`
|
||||||
|
value first (i.e. the ones that were most recently completely scanned),
|
||||||
|
then the highest `max_transid` (i.e. the ones that were created later),
|
||||||
|
then falls back to "independent" mode to break ties. This interrupts
|
||||||
|
long scans of old subvols to give a rapid dedupe response to new data,
|
||||||
|
then returns to the old subvols after the new data is scanned. It is
|
||||||
|
useful for large filesystems with multiple active subvols and rotating
|
||||||
|
snapshots, where the first-pass scan can take months, but new duplicate
|
||||||
|
data appears every day.
|
||||||
|
|
||||||
|
The default scan mode is 1, "independent".
|
||||||
|
|
||||||
If you are using bees for the first time on a filesystem with many
|
If you are using bees for the first time on a filesystem with many
|
||||||
existing snapshots, you should read about [snapshot gotchas](gotchas.md).
|
existing snapshots, you should read about [snapshot gotchas](gotchas.md).
|
||||||
|
@ -40,16 +40,16 @@
|
|||||||
|
|
||||||
* `--scan-mode MODE` or `-m`
|
* `--scan-mode MODE` or `-m`
|
||||||
|
|
||||||
Specify extent scanning algorithm. Default `MODE` is 0.
|
Specify extent scanning algorithm. Default `MODE` is 3.
|
||||||
**EXPERIMENTAL** feature that may go away.
|
**EXPERIMENTAL** feature that may go away.
|
||||||
|
|
||||||
* Mode 0: scan extents in ascending order of (inode, subvol, offset).
|
* Mode 0: lockstep
|
||||||
Keeps shared extents between snapshots together. Reads files sequentially.
|
* Mode 1: independent
|
||||||
Minimizes temporary space usage.
|
* Mode 2: sequential
|
||||||
* Mode 1: scan extents from all subvols in parallel. Good performance
|
* Mode 3: recent
|
||||||
on non-spinning media when subvols are unrelated.
|
|
||||||
* Mode 2: scan all extents from one subvol at a time. Good sequential
|
For details of the different scanning modes, see
|
||||||
read performance for spinning media. Maximizes temporary space usage.
|
[bees configuration](docs/config.md).
|
||||||
|
|
||||||
## Workarounds
|
## Workarounds
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user