docs: add scan mode 4, "extent"

Extent is a different kind of scan mode, so introduce the concept of the two kinds of scan mode, and rearrange the description of scan modes along the new boundaries. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-12 05:12:25 +02:00 · 2023-01-05 01:06:06 -05:00
parent c1af219246
commit 25f7ced27b
2 changed files with 147 additions and 55 deletions
--- a/docs/config.md
+++ b/docs/config.md
@ -98,27 +98,72 @@ code files over and over, so it will need a smaller hash table than a
 backup server which has to refer to the oldest data on the filesystem
 every time a new client machine's data is added to the server.
-Scanning modes for multiple subvols
+Scanning modes
-----------------------------------
+--------------
-The `--scan-mode` option affects how bees schedules worker threads
+The `--scan-mode` option affects how bees iterates over the filesystem,
-between subvolumes.  Scan modes are an experimental feature and will
+schedules extents for scanning, and tracks progress.
 likely be deprecated in favor of a better solution.
-Scan mode can be changed at any time by restarting bees with a different
+There are now two kinds of scan mode:  the legacy **subvol** scan modes,
-mode option.  Scan state tracking is the same for all of the currently
+and the new **extent** scan mode.
 implemented modes.  The difference between the modes is the order in
 which subvols are selected.
-If a filesystem has only one subvolume with data in it, then the
+Scan mode can be changed by restarting bees with a different scan mode
-`--scan-mode` option has no effect.  In this case, there is only one
+option.
 subvolume to scan, so worker threads will all scan that one.
-Within a subvol, there is a single optimal scan order:  files are scanned
+Extent scan mode:
-in ascending numerical inode order.  Each worker will scan a different
+
-inode to avoid having the threads contend with each other for locks.
+ * Works with 4.15 and later kernels.
-File data is read sequentially and in order, but old blocks from earlier
+ * Can estimate progress and provide an ETA.
-scans are skipped.
+ * Can optimize scanning order to dedupe large extents first.
 * Cannot avoid modifying read-only subvols.
 * Can keep up with frequent creation and deletion of snapshots.
 Subvol scan modes:
 * Work with 4.14 and earlier kernels.
 * Cannot estimate or report progress.
 * Cannot optimize scanning order by extent size.
 * Can avoid modifying read-only subvols (for `btrfs send` workaround).
 * Have problems keeping up with snapshots created during a scan.
 The default scan mode is 1, "independent".
 If you are using bees for the first time on a filesystem with many
 existing snapshots, you should read about [snapshot gotchas](gotchas.md).
 Subvol scan modes
 -----------------
 Subvol scan modes are maintained for compatibility with existing
 installations, but will not be developed further.  New installations
 should use extent scan mode instead.
 The _quantity_ of text below detailing the shortcomings of each subvol
 scan mode should be informative all by itself.
 Subvol scan modes work on any kernel version supported by bees.  They
 are the only scan modes usable on kernel 4.14 and earlier.
 The difference between the subvol scan modes is the order in which the
 files from different subvols are fed into the scanner.  They all scan
 files in inode number order, from low to high offset within each inode,
 the same way that a program like `cat` would read files (but skipping
 over old data from earlier btrfs transactions).
 If a filesystem has only one subvolume with data in it, then all of
 the subvol scan modes are equivalent.  In this case, there is only one
 subvolume to scan, so every possible ordering of subvols is the same.
 The `--workaround-btrfs-send` option pauses scanning subvols that are
 read-only.  If the subvol is made read-write (e.g. with `btrfs prop set
 $subvol ro false`), or if the `--workaround-btrfs-send` option is removed,
 then the scan of that subvol is unpaused and dedupe proceeds normally.
 Space will only be recovered when the last read-only subvol is deleted.
 Subvol scan modes cannot efficiently or accurately calculate an ETA for
 completion or estimate progress through the data.  They simply request
 "the next new inode" from btrfs, and they are completed when btrfs says
 there is no next new inode.
 Between subvols, there are several scheduling algorithms with different
 trade-offs:
@ -126,53 +171,99 @@ trade-offs:
 Scan mode 0, "lockstep", scans the same inode number in each subvol at
 close to the same time.  This is useful if the subvols are snapshots
 with a common ancestor, since the same inode number in each subvol will
-have similar or identical contents.  This maximizes the likelihood
+have similar or identical contents.  This maximizes the likelihood that
-that all of the references to a snapshot of a file are scanned at
+all of the references to a snapshot of a file are scanned at close to
-close to the same time, improving dedupe hit rate and possibly taking
+the same time, improving dedupe hit rate.  If the subvols are unrelated
-advantage of VFS caching in the Linux kernel.  If the subvols are
+(i.e. not snapshots of a single subvol) then this mode does not provide
-unrelated (i.e. not snapshots of a single subvol) then this mode does
+any significant advantage.  This mode uses smaller amounts of temporary
-not provide significant benefit over random selection.  This mode uses
+space for shorter periods of time when most subvols are snapshots.  When a
-smaller amounts of temporary space for shorter periods of time when most
+new snapshot is created, this mode will stop scanning other subvols and
-subvols are snapshots.  When a new snapshot is created, this mode will
+scan the new snapshot until the same inode number is reached in each
-stop scanning other subvols and scan the new snapshot until the same
+subvol, which will effectively stop dedupe temporarily as this data has
-inode number is reached in each subvol, which will effectively stop
+already been scanned and deduped in the other snapshots.
 dedupe temporarily as this data has already been scanned and deduped
 in the other snapshots.
-Scan mode 1, "independent", scans the next inode with new data in each
+Scan mode 1, "independent", scans the next inode with new data in
-subvol.  Each subvol's scanner shares inodes uniformly with all other
+each subvol.  There is no coordination between the subvols, other than
-subvol scanners until the subvol has no new inodes left.  This mode makes
+round-robin distribution of files from each subvol to each worker thread.
-continuous forward progress across the filesystem and provides average
+This mode makes continuous forward progress in all subvols.  When a new
-performance across a variety of workloads, but is slow to respond to new
+snapshot is created, previous subvol scans continue as before, but the
-data, and may spend a lot of time deduping short-lived subvols that will
+worker threads are now divided among one more subvol.
 soon be deleted when it is preferable to dedupe long-lived subvols that
 will be the origin of future snapshots.  When a new snapshot is created,
 previous subvol scans continue as before, but the time is now divided
 among one more subvol.
 Scan mode 2, "sequential", scans one subvol at a time, in numerical subvol
-ID order, processing each subvol completely before proceeding to the
+ID order, processing each subvol completely before proceeding to the next
-next subvol.  This avoids spending time scanning short-lived snapshots
+subvol.  This avoids spending time scanning short-lived snapshots that
-that will be deleted before they can be fully deduped (e.g. those used
+will be deleted before they can be fully deduped (e.g. those used for
-for `btrfs send`).  Scanning is concentrated on older subvols that are
+`btrfs send`).  Scanning starts on older subvols that are more likely
-more likely to be origin subvols for future snapshots, eliminating the
+to be origin subvols for future snapshots, eliminating the need to
-need to dedupe future snapshots separately.  This mode uses the largest
+dedupe future snapshots separately.  This mode uses the largest amount
-amount of temporary space for the longest time, and typically requires
+of temporary space for the longest time, and typically requires a larger
-a larger hash table to maintain dedupe hit rate.
+hash table to maintain dedupe hit rate.
 Scan mode 3, "recent", scans the subvols with the highest `min_transid`
 value first (i.e. the ones that were most recently completely scanned),
 then falls back to "independent" mode to break ties.  This interrupts
-long scans of old subvols to give a rapid dedupe response to new data,
+long scans of old subvols to give a rapid dedupe response to new data
-then returns to the old subvols after the new data is scanned.  It is
+in previously scanned subvols, then returns to the old subvols after
-useful for large filesystems with multiple active subvols and rotating
+the new data is scanned.
 snapshots, where the first-pass scan can take months, but new duplicate
 data appears every day.
-The default scan mode is 1, "independent".
+Extent scan mode
 ----------------
-If you are using bees for the first time on a filesystem with many
+Scan mode 4, "extent", scans the extent tree instead of the subvol trees.
-existing snapshots, you should read about [snapshot gotchas](gotchas.md).
+Extent scan mode reads each extent once, regardless of the number of
 reflinks or snapshots.  It adapts to the creation of new snapshots
 immediately, without having to revisit old data.
 In the extent scan mode, extents are separated into multiple size tiers
 to prioritize large extents over small ones.  Deduping large extents
 keeps the metadata update cost low per block saved, resulting in faster
 dedupe at the start of a scan cycle.  This is important for maximizing
 performance in use cases where bees runs for a limited time, such as
 during an overnight maintenance window.
 Once the larger size tiers are completed, dedupe space recovery speeds
 slow down significantly.  It may be desirable to stop bees running once
 the larger size tiers are finished, then start bees running some time
 later after new data has appeared.
 Each extent is mapped in physical address order, and all extent references
 are submitted to the scanner at the same time, resulting in much better
 cache behavior and dedupe performance compared to the subvol scan modes.
 The "extent" scan mode is not usable on kernels before 4.15 because
 it relies on the `LOGICAL_INO_V2` ioctl added in that kernel release.
 When using bees with an older kernel, only subvol scan modes will work.
 Extents are divided into virtual subvols by size, using reserved btrfs
 subvol IDs 250..255.  The size tier groups are:
 * 250: 32M+1 and larger
 * 251: 8M+1..32M
 * 252: 2M+1..8M
 * 253: 512K+1..2M
 * 254: 128K+1..512K
 * 255: 128K and smaller (includes all compressed extents)
 Extent scan mode can efficiently calculate dedupe progress within
 the filesystem and estimate an ETA for completion within each size
 tier; however, the accuracy of the ETA can be questionable due to the
 non-uniform distribution of block addresses in a typical user filesystem.
 Older versions of bees do not recognize the virtual subvols, so running
 an old bees version after running a new bees version will reset the
 "extent" scan mode's progress in `beescrawl.dat` to the beginning.
 This may change in future bees releases, i.e. extent scans will store
 their checkpoint data somewhere else.
 The `--workaround-btrfs-send` option behaves differently in extent
 scan modes:  In extent scan mode, dedupe proceeds on all subvols that are
 read-write, but all subvols that are read-only are excluded from dedupe.
 Space will only be recovered when the last read-only subvol is deleted.
 During `btrfs send` all duplicate extents in the sent subvol will not be
 removed (the kernel will reject dedupe commands while send is active,
 and bees currently will not re-issue them after the send is complete).
 It may be preferable to terminate the bees process while running `btrfs
 send` in extent scan mode, and restart bees after the `send` is complete.
 Threads and load management
 ---------------------------
--- a/docs/options.md
+++ b/docs/options.md
@ -47,6 +47,7 @@
  * Mode 1: independent
  * Mode 2: sequential
  * Mode 3: recent
  * Mode 4: extent
 For details of the different scanning modes and the default value of
 this option, see [bees configuration](config.md).