From dceea2ebbcd927110b06dd21eceab705ce75735d Mon Sep 17 00:00:00 2001 From: Zygo Blaxell Date: Fri, 9 Oct 2020 15:51:03 -0400 Subject: [PATCH] docs: improve send workaround text, add references to backref commits, make grammar more good now Rewrite the text related to 'btrfs send' to clarify that the send workaround is no longer necessary to avoid kernel crashes, but still useful because send and dedupe still do not work at the same time. Replace "many backref code changes" with a specific commit reference, and improve the grammar of some issue descriptions. Signed-off-by: Zygo Blaxell --- docs/btrfs-kernel.md | 74 ++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 34 deletions(-) diff --git a/docs/btrfs-kernel.md b/docs/btrfs-kernel.md index 991ad7d..5c02077 100644 --- a/docs/btrfs-kernel.md +++ b/docs/btrfs-kernel.md @@ -44,15 +44,15 @@ These bugs are particularly popular among bees users: | 5.1 | 5.4 | metadata corruption resulting in loss of filesystem when a write operation occurs while balance starts a new block group. **Do not use kernel 5.1 with btrfs.** Kernel 5.2 and 5.3 have workarounds that may detect corruption in progress and abort before it becomes permanent, but do not prevent corruption from occurring. | 5.4.14, 5.5 and later | 6282675e6708 btrfs: relocation: fix reloc_root lifespan and access | 4.5, backported to 3.18.31, 4.1.22, 4.4.4 | 5.5 | `df` incorrectly reports 0 free space while data space is available. Triggered by changes in metadata size, including those typical of large-scale dedupe. Occurs more often starting in 5.3 and especially 5.4 | 4.4.213, 4.9.213, 4.14.170, 4.19.102, 5.4.18, 5.5.2, 5.6 and later | d55966c4279b btrfs: do not zero f_bavail if we have available space | - | 5.5 | kernel crashes due to various tree mod log issues (often triggered by bees) | 3.16.84, 4.4.214, 4.9.214, 4.14.171, 4.19.103, 5.4.19, 5.5.3, 5.6 and later | at least 3, last is 7227ff4de55d Btrfs: fix race between adding and putting tree mod seq elements and nodes -| 5.0 | 5.5 | last extent in file not removed by dedupe if file size is not a multiple of 4K | 5.4.19, 5.5.3, 5.6 and later | 831d2fa25ab8 Btrfs: make deduplication with range including the last block work +| 5.0 | 5.5 | dedupe fails to remove the last extent in a file if the file size is not a multiple of 4K | 5.4.19, 5.5.3, 5.6 and later | 831d2fa25ab8 Btrfs: make deduplication with range including the last block work | - | 5.6 | deadlock when enumerating file references to physical extent addresses while some references still exist in deleted subvols | 5.7 and later | 39dba8739c4e btrfs: do not resolve backrefs for roots that are being deleted | - | 5.6 | deadlock when many extent reference updates are pending and available memory is low | 4.14.177, 4.19.116, 5.4.33, 5.5.18, 5.6.5, 5.7 and later | 351cbf6e4410 btrfs: use nofs allocations for running delayed items -| - | 5.6 | excessive CPU usage and btrfs write latency when translating from extent physical address to list of referencing files and offsets (`LOGICAL_INO` ioctl) | 5.7 and later | many backref code changes in kernel 5.7, also improvements in many earlier kernels +| - | 5.6 | excessive CPU usage in `LOGICAL_INO` ioctl and increased btrfs write latency in other processes when bees translates from extent physical address to list of referencing files and offsets | 5.7 and later | b25b0b871f20 btrfs: backref, use correct count to resolve normal data refs, plus 3 parent commits. Some improvements also in earlier kernels. | - | 5.7 | filesystem becomes read-only if out of space while deleting snapshot | 4.9.238, 4.14.200, 4.19.149, 5.4.69, 5.8 and later | 7c09c03091ac btrfs: don't force read-only after error in drop snapshot -| 5.1 | 5.7 | balance, device delete, or filesystem shrink operations loop endlessly on a single block group, extent count does not decrease | 5.4.54, 5.7.11, 5.8 and later | 1dae7e0e58b4 btrfs: reloc: clear DEAD\_RELOC\_TREE bit for orphan roots to prevent runaway balance +| 5.1 | 5.7 | balance, device delete, or filesystem shrink operations loop endlessly on a single block group without decreasing extent count | 5.4.54, 5.7.11, 5.8 and later | 1dae7e0e58b4 btrfs: reloc: clear DEAD\_RELOC\_TREE bit for orphan roots to prevent runaway balance | - | 5.8 | deadlock in `TREE_SEARCH` ioctl (core component of bees filesystem scanner), followed by regression in deadlock fix | 4.4.237, 4.9.237, 4.14.199, 4.19.146, 5.4.66, 5.8.10 and later | a48b73eca4ce btrfs: fix potential deadlock in the search ioctl, 1c78544eaa46 btrfs: fix wrong address when faulting in pages in the search ioctl | 4.15 | - | spurious warnings from `fs/fs-writeback.c` when `flushoncommit` is enabled | - | workaround: comment out the `WARN_ON` -| 5.7 | - | kernel crash if balance receives fatal signal at wrong point during start of new block group | - | workaround: keep `btrfs balance` from being killed or Ctrl-Ced +| 5.7 | - | kernel crash if balance receives fatal signal at wrong moment during start of new block group | - | workaround: keep `btrfs balance` from being killed or Ctrl-Ced "Last bad kernel" refers to that version's last stable update from kernel.org. Distro kernels may backport additional fixes. Consult @@ -86,54 +86,60 @@ Workarounds for known kernel bugs This workaround is not necessary for kernels 5.4.19, 5.5.3, 5.6 and later. * **Slow backrefs** (aka toxic extents): Under certain conditions, - if the number of references to a single shared extent grows too high, - the kernel consumes more and more CPU while holding locks that block - access to the filesystem. bees avoids this bug by measuring the time - the kernel spends performing `LOGICAL_INO` operations and permanently - blacklisting any extent or hash involved where the kernel starts - to get slow. In the bees log, such blocks are labelled as 'toxic' - hash/block addresses. Toxic extents are rare (about 1 in 100,000 - extents become toxic), but toxic extents can become 8 orders of - magnitude more expensive to process than the fastest non-toxic - extents. This seems to affect all dedupe agents on btrfs; at this - time of writing only bees has a workaround for this bug. + if the number of references to a single shared extent grows too + high, the kernel consumes more and more CPU while also holding locks + that delay write access to the filesystem. bees avoids this bug + by measuring the time the kernel spends performing `LOGICAL_INO` + operations and permanently blacklisting any extent or hash involved + where the kernel starts to get slow. In the bees log, such blocks + are labelled as 'toxic' hash/block addresses. Toxic extents are + rare (about 1 in 100,000 extents become toxic), but toxic extents can + become 8 orders of magnitude more expensive to process than the fastest + non-toxic extents. This seems to affect all dedupe agents on btrfs; + at this time of writing only bees has a workaround for this bug. Update (5.8.14): this issue may be a race condition that occurs if two or more threads attempt to modify the same extent or immediately adjacent extents. It has not been observed on a kernel version later than 5.7 (after backref code changes in the kernel). -* **`btrfs send` is incompatible with dedupe in old kernels**. - The bees option `--workaround-btrfs-send` prevents any modification - of read-only subvols in order to avoid breaking `btrfs send`. +* **dedupe breaks `btrfs send` in old kernels**. The bees option + `--workaround-btrfs-send` prevents any modification of read-only subvols + in order to avoid breaking `btrfs send`. - This workaround is not necessary for kernels 4.9.188, 4.14.137, 4.19.65, - 5.2.7, 5.3 and later. + This workaround is no longer necessary to avoid kernel crashes + and send performance failure on kernel 4.9.207, 4.14.159, 4.19.90, + 5.3.17, 5.4.4, 5.5 and later; however, some conflict between send + and dedupe still remains, so the workaround is still useful. + + `btrfs receive` is not affected by this issue. Unfixed kernel bugs ------------------- As of 5.8.14: -* **`btrfs send` still cannot run at the same time as dedupe**, even - with all current fixes. Recent kernels refuse the dedupe operation with - error `EAGAIN`. If `btrfs send` is invoked at the instant when a dedupe - operation is running, `send` will fail to start and return an error. +* **The kernel does not permit `btrfs send` and dedupe to run at the + same time**. Recent kernels no longer crash, but now refuse one + operation with an error if the other operation was already running. bees has not been updated to handle the new dedupe behavior optimally. - Optimal behavior is to defer dedupe operations until after the send is - finished. Current bees behavior is to complain loudly about the dedupe - failure in log messages, and abandon the duplicate data references - in the sending snapshot. A future bees version shall have better - handling for this situation. + Optimal behavior is to defer dedupe operations when send is detected, + and resume after the send is finished. Current bees behavior is to + complain loudly about each individual dedupe failure in log messages, + and abandon duplicate data references in the snapshot that send is + processing. A future bees version shall have better handling for + this situation. Workaround: send `SIGSTOP` to bees, or terminate the bees process, - while `btrfs send` is running. This workaround is not required if - snapshot is deleted after sending--in that case, any duplicate data - blocks that were not removed by dedupe will be removed by snapshot - delete instead. + before running `btrfs send`. - `btrfs receive` is not affected by these issues. + This workaround is not strictly required if snapshot is deleted after + sending. In that case, any duplicate data blocks that were not removed + by dedupe will be removed by snapshot delete instead. The workaround + still saves some IO. + + `btrfs receive` is not affected by this issue. * **Spurious warnings in `fs/fs-writeback.c`** on kernel 4.15 and later when filesystem is mounted with `flushoncommit`. These