docs: toxic extents and btrfs send

Update documentation of toxic extent / slow backref workaround. Add notes about btrfs send kernel bugs and incremental send failures. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2026-01-08 20:00:22 +01:00 · 2018-11-06 00:56:06 -05:00
parent 688d0dc014
commit 19859b0a0d
3 changed files with 27 additions and 28 deletions
--- a/docs/btrfs-kernel.md
+++ b/docs/btrfs-kernel.md
@@ -38,19 +38,23 @@ Unfixed kernel bugs (as of 4.14.71):
  `rsync` is copying from, while `rsync` will rename the new file over
  the old file to replace it.
 * **btrfs send** has various problems when bees is deduping RO snapshots,
  especially if the snapshot is used as a parent for incremental send.
 Minor kernel problems with workarounds:
-* **Slow backrefs** (aka toxic extents): If the number of references to a
+* **Slow backrefs** (aka toxic extents):  Under certain conditions,
-  single shared extent within a single file grows above a few thousand,
+  if the number of references to a single shared extent grows too high,
-  the kernel consumes CPU for minutes at a time while holding various
+  the kernel consumes more and more CPU while holding locks that block
-  locks that block access to the filesystem.  bees avoids this bug
+  access to the filesystem.  bees avoids this bug by measuring the time
-  by measuring the time the kernel spends performing `LOGICAL_INO`
+  the kernel spends performing `LOGICAL_INO` operations and permanently
-  operations and permanently blacklisting any extent or hash involved
+  blacklisting any extent or hash involved where the kernel starts
-  where the kernel starts to get slow.  Inside bees, such blocks are
+  to get slow.  In the bees log, such blocks are labelled as 'toxic'
-  known as 'toxic' hash/block addresses.
+  hash/block addresses.
-* **`FILE_EXTENT_SAME` is arbitrarily limited to 16MB**.  This is
+Older kernels:
-  less than 128MB which is the maximum extent size that can be created
+
-  by defrag, prealloc, or filesystems without the `compress-force`
+* Older kernels have various data corruption and deadlock/hang issues
-  mount option.  bees avoids feedback loops this can generate while
+  that are no longer listed here, and older kernels are missing important
-  attempting to replace extents over 16MB in length.
+  features such as `LOGICAL_INO_V2`.  Using an older kernel is not
  recommended.
--- a/docs/btrfs-other.md
+++ b/docs/btrfs-other.md
@@ -31,7 +31,8 @@ bees has been tested in combination with the following, and various problems are
 * btrfs send:  some kernel versions have bugs in btrfs send that can be
  triggered by bees.  The send can be restarted and will work if bees
  has finished processing the snapshot being sent.  No data corruption
-  observed other than the truncated send.
+  observed other than the truncated send.  Incremental send doesn't seem
  to work with bees running on the sending side.
 * btrfs qgroups:  very slow, sometimes hangs...and it's even worse when
  bees is running.
 * btrfs autodefrag mount option:  hangs and high CPU usage problems
--- a/docs/gotchas.md
+++ b/docs/gotchas.md
@@ -78,22 +78,16 @@ Other Gotchas
 -------------
 * bees avoids the [slow backrefs kernel bug](btrfs-kernel.md) by
-  measuring the time required to perform `LOGICAL_INO` operations.  If an
+  measuring the time required to perform `LOGICAL_INO` operations.
-  extent requires over 10 seconds to perform a `LOGICAL_INO` then bees
+  If an extent requires over 0.1 kernel CPU seconds to perform a
-  blacklists the extent and avoids referencing it in future operations.
+  `LOGICAL_INO` ioctl, then bees blacklists the extent and avoids
-  In most cases, fewer than 0.1% of extents in a filesystem must be
+  referencing it in future operations.  In most cases, fewer than 0.1%
-  avoided this way.  This results in short write latency spikes of up
+  of extents in a filesystem must be avoided this way.  This results
-  to and a little over 10 seconds as btrfs will not allow writes to the
+  in short write latency spikes as btrfs will not allow writes to the
  filesystem while `LOGICAL_INO` is running.  Generally the CPU spends
  most of the runtime of the `LOGICAL_INO` ioctl running the kernel,
-  so on a single-core CPU the entire system can freeze up for a few
+  so on a single-core CPU the entire system can freeze up for a second
-  seconds at a time.
+  during operations on toxic extents.
 * Load managers that send a `SIGSTOP` to the bees process to throttle
  CPU usage may affect the `LOGICAL_INO` timing mechanism, causing extents
  to be incorrectly labelled 'toxic'.  This will cause a small reduction
  of dedupe hit rate.  Slow and heavily loaded disks can trigger the same
  effect if `LOGICAL_INO` takes too long due to IO latency.
 * If a process holds a directory FD open, the subvol containing the
  directory cannot be deleted (`btrfs sub del` will start the deletion