From 7fcde97b7032fc6f1f4ce1ba8a8b613b2bd08bad Mon Sep 17 00:00:00 2001 From: Zygo Blaxell Date: Sat, 11 Jan 2025 01:01:40 -0500 Subject: [PATCH] docs: update the bug reporting and status instructions Thread names have changed. Document some of the newer ones. Don't jump immediately to blaming poor performance on qgroups or autodefrag. These do sometimes have kernel regressions but not all the time. Emphasize advantage of controlling bees deferred work requests at the source, before btrfs gets stuck committing them. Avoid asserting that it's OK for gdb to crash. Remove mention of lower-layer block device issues wrt corruption. Signed-off-by: Zygo Blaxell --- docs/wrong.md | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/docs/wrong.md b/docs/wrong.md index d4ffbb7..f78ed7a 100644 --- a/docs/wrong.md +++ b/docs/wrong.md @@ -4,16 +4,13 @@ What to do when something goes wrong with bees Hangs and excessive slowness ---------------------------- -### Are you using qgroups or autodefrag? - - Read about [bad btrfs feature interactions](btrfs-other.md). - ### Use load-throttling options If bees is just more aggressive than you would like, consider using [load throttling options](options.md). These are usually more effective than `ionice`, `schedtool`, and the `blkio` cgroup (though you can - certainly use those too). + certainly use those too) because they limit work that bees queues up + for later execution inside btrfs. ### Check `$BEESSTATUS` @@ -52,10 +49,6 @@ dst = 15 /run/bees/ede84fbd-cb59-0c60-9ea7-376fa4984887/data.new/home/builder/li Thread names of note: - * `crawl_12345`: scan/dedupe worker threads (the number is the subvol - ID which the thread is currently working on). These threads appear - and disappear from the status dynamically according to the requirements - of the work queue and loadavg throttling. * `bees`: main thread (doesn't do anything after startup, but its task execution time is that of the whole bees process) * `crawl_master`: task that finds new extents in the filesystem and populates the work queue * `crawl_transid`: btrfs transid (generation number) tracker and polling thread @@ -64,6 +57,13 @@ dst = 15 /run/bees/ede84fbd-cb59-0c60-9ea7-376fa4984887/data.new/home/builder/li * `hash_writeback`: trickle-writes the hash table back to `beeshash.dat` * `hash_prefetch`: prefetches the hash table at startup and updates `beesstats.txt` hourly +Most other threads have names that are derived from the current dedupe +task that they are executing: + + * `ref_205ad76b1000_24K_50`: extent scan performing dedupe of btrfs extent bytenr `205ad76b1000`, which is 24 KiB long and has 50 references + * `extent_250_32M_16E`: extent scan searching for extents between 32 MiB + 1 and 16 EiB bytes long, tracking scan position in virtual subvol `250`. + * `crawl_378_18916`: subvol scan searching for extent refs in subvol `378`, inode `18916`. + ### Dump kernel stacks of hung processes Check the kernel stacks of all blocked kernel processes: @@ -91,7 +91,7 @@ bees Crashes (gdb) thread apply all bt full The last line generates megabytes of output and will often crash gdb. - This is OK, submit whatever output gdb can produce. + Submit whatever output gdb can produce. **Note that this output may include filenames or data from your filesystem.** @@ -160,8 +160,7 @@ Kernel crashes, corruption, and filesystem damage ------------------------------------------------- bees doesn't do anything that _should_ cause corruption or data loss; -however, [btrfs has kernel bugs](btrfs-kernel.md) and [interacts poorly -with some Linux block device layers](btrfs-other.md), so corruption is +however, [btrfs has kernel bugs](btrfs-kernel.md), so corruption is not impossible. Issues with the btrfs filesystem kernel code or other block device layers