mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 13:25:45 +02:00
docs: toxic extents and btrfs send
Update documentation of toxic extent / slow backref workaround. Add notes about btrfs send kernel bugs and incremental send failures. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
688d0dc014
commit
19859b0a0d
@ -38,19 +38,23 @@ Unfixed kernel bugs (as of 4.14.71):
|
||||
`rsync` is copying from, while `rsync` will rename the new file over
|
||||
the old file to replace it.
|
||||
|
||||
* **btrfs send** has various problems when bees is deduping RO snapshots,
|
||||
especially if the snapshot is used as a parent for incremental send.
|
||||
|
||||
Minor kernel problems with workarounds:
|
||||
|
||||
* **Slow backrefs** (aka toxic extents): If the number of references to a
|
||||
single shared extent within a single file grows above a few thousand,
|
||||
the kernel consumes CPU for minutes at a time while holding various
|
||||
locks that block access to the filesystem. bees avoids this bug
|
||||
by measuring the time the kernel spends performing `LOGICAL_INO`
|
||||
operations and permanently blacklisting any extent or hash involved
|
||||
where the kernel starts to get slow. Inside bees, such blocks are
|
||||
known as 'toxic' hash/block addresses.
|
||||
* **Slow backrefs** (aka toxic extents): Under certain conditions,
|
||||
if the number of references to a single shared extent grows too high,
|
||||
the kernel consumes more and more CPU while holding locks that block
|
||||
access to the filesystem. bees avoids this bug by measuring the time
|
||||
the kernel spends performing `LOGICAL_INO` operations and permanently
|
||||
blacklisting any extent or hash involved where the kernel starts
|
||||
to get slow. In the bees log, such blocks are labelled as 'toxic'
|
||||
hash/block addresses.
|
||||
|
||||
* **`FILE_EXTENT_SAME` is arbitrarily limited to 16MB**. This is
|
||||
less than 128MB which is the maximum extent size that can be created
|
||||
by defrag, prealloc, or filesystems without the `compress-force`
|
||||
mount option. bees avoids feedback loops this can generate while
|
||||
attempting to replace extents over 16MB in length.
|
||||
Older kernels:
|
||||
|
||||
* Older kernels have various data corruption and deadlock/hang issues
|
||||
that are no longer listed here, and older kernels are missing important
|
||||
features such as `LOGICAL_INO_V2`. Using an older kernel is not
|
||||
recommended.
|
||||
|
@ -31,7 +31,8 @@ bees has been tested in combination with the following, and various problems are
|
||||
* btrfs send: some kernel versions have bugs in btrfs send that can be
|
||||
triggered by bees. The send can be restarted and will work if bees
|
||||
has finished processing the snapshot being sent. No data corruption
|
||||
observed other than the truncated send.
|
||||
observed other than the truncated send. Incremental send doesn't seem
|
||||
to work with bees running on the sending side.
|
||||
* btrfs qgroups: very slow, sometimes hangs...and it's even worse when
|
||||
bees is running.
|
||||
* btrfs autodefrag mount option: hangs and high CPU usage problems
|
||||
|
@ -78,22 +78,16 @@ Other Gotchas
|
||||
-------------
|
||||
|
||||
* bees avoids the [slow backrefs kernel bug](btrfs-kernel.md) by
|
||||
measuring the time required to perform `LOGICAL_INO` operations. If an
|
||||
extent requires over 10 seconds to perform a `LOGICAL_INO` then bees
|
||||
blacklists the extent and avoids referencing it in future operations.
|
||||
In most cases, fewer than 0.1% of extents in a filesystem must be
|
||||
avoided this way. This results in short write latency spikes of up
|
||||
to and a little over 10 seconds as btrfs will not allow writes to the
|
||||
measuring the time required to perform `LOGICAL_INO` operations.
|
||||
If an extent requires over 0.1 kernel CPU seconds to perform a
|
||||
`LOGICAL_INO` ioctl, then bees blacklists the extent and avoids
|
||||
referencing it in future operations. In most cases, fewer than 0.1%
|
||||
of extents in a filesystem must be avoided this way. This results
|
||||
in short write latency spikes as btrfs will not allow writes to the
|
||||
filesystem while `LOGICAL_INO` is running. Generally the CPU spends
|
||||
most of the runtime of the `LOGICAL_INO` ioctl running the kernel,
|
||||
so on a single-core CPU the entire system can freeze up for a few
|
||||
seconds at a time.
|
||||
|
||||
* Load managers that send a `SIGSTOP` to the bees process to throttle
|
||||
CPU usage may affect the `LOGICAL_INO` timing mechanism, causing extents
|
||||
to be incorrectly labelled 'toxic'. This will cause a small reduction
|
||||
of dedupe hit rate. Slow and heavily loaded disks can trigger the same
|
||||
effect if `LOGICAL_INO` takes too long due to IO latency.
|
||||
so on a single-core CPU the entire system can freeze up for a second
|
||||
during operations on toxic extents.
|
||||
|
||||
* If a process holds a directory FD open, the subvol containing the
|
||||
directory cannot be deleted (`btrfs sub del` will start the deletion
|
||||
|
Loading…
x
Reference in New Issue
Block a user