1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 13:25:45 +02:00

docs: update the bug reporting and status instructions

Thread names have changed.  Document some of the newer ones.

Don't jump immediately to blaming poor performance on qgroups or
autodefrag.  These do sometimes have kernel regressions but not all
the time.

Emphasize advantage of controlling bees deferred work requests at the
source, before btrfs gets stuck committing them.

Avoid asserting that it's OK for gdb to crash.

Remove mention of lower-layer block device issues wrt corruption.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell 2025-01-11 01:01:40 -05:00
parent e457f502b7
commit 7fcde97b70

View File

@ -4,16 +4,13 @@ What to do when something goes wrong with bees
Hangs and excessive slowness Hangs and excessive slowness
---------------------------- ----------------------------
### Are you using qgroups or autodefrag?
Read about [bad btrfs feature interactions](btrfs-other.md).
### Use load-throttling options ### Use load-throttling options
If bees is just more aggressive than you would like, consider using If bees is just more aggressive than you would like, consider using
[load throttling options](options.md). These are usually more effective [load throttling options](options.md). These are usually more effective
than `ionice`, `schedtool`, and the `blkio` cgroup (though you can than `ionice`, `schedtool`, and the `blkio` cgroup (though you can
certainly use those too). certainly use those too) because they limit work that bees queues up
for later execution inside btrfs.
### Check `$BEESSTATUS` ### Check `$BEESSTATUS`
@ -52,10 +49,6 @@ dst = 15 /run/bees/ede84fbd-cb59-0c60-9ea7-376fa4984887/data.new/home/builder/li
Thread names of note: Thread names of note:
* `crawl_12345`: scan/dedupe worker threads (the number is the subvol
ID which the thread is currently working on). These threads appear
and disappear from the status dynamically according to the requirements
of the work queue and loadavg throttling.
* `bees`: main thread (doesn't do anything after startup, but its task execution time is that of the whole bees process) * `bees`: main thread (doesn't do anything after startup, but its task execution time is that of the whole bees process)
* `crawl_master`: task that finds new extents in the filesystem and populates the work queue * `crawl_master`: task that finds new extents in the filesystem and populates the work queue
* `crawl_transid`: btrfs transid (generation number) tracker and polling thread * `crawl_transid`: btrfs transid (generation number) tracker and polling thread
@ -64,6 +57,13 @@ dst = 15 /run/bees/ede84fbd-cb59-0c60-9ea7-376fa4984887/data.new/home/builder/li
* `hash_writeback`: trickle-writes the hash table back to `beeshash.dat` * `hash_writeback`: trickle-writes the hash table back to `beeshash.dat`
* `hash_prefetch`: prefetches the hash table at startup and updates `beesstats.txt` hourly * `hash_prefetch`: prefetches the hash table at startup and updates `beesstats.txt` hourly
Most other threads have names that are derived from the current dedupe
task that they are executing:
* `ref_205ad76b1000_24K_50`: extent scan performing dedupe of btrfs extent bytenr `205ad76b1000`, which is 24 KiB long and has 50 references
* `extent_250_32M_16E`: extent scan searching for extents between 32 MiB + 1 and 16 EiB bytes long, tracking scan position in virtual subvol `250`.
* `crawl_378_18916`: subvol scan searching for extent refs in subvol `378`, inode `18916`.
### Dump kernel stacks of hung processes ### Dump kernel stacks of hung processes
Check the kernel stacks of all blocked kernel processes: Check the kernel stacks of all blocked kernel processes:
@ -91,7 +91,7 @@ bees Crashes
(gdb) thread apply all bt full (gdb) thread apply all bt full
The last line generates megabytes of output and will often crash gdb. The last line generates megabytes of output and will often crash gdb.
This is OK, submit whatever output gdb can produce. Submit whatever output gdb can produce.
**Note that this output may include filenames or data from your **Note that this output may include filenames or data from your
filesystem.** filesystem.**
@ -160,8 +160,7 @@ Kernel crashes, corruption, and filesystem damage
------------------------------------------------- -------------------------------------------------
bees doesn't do anything that _should_ cause corruption or data loss; bees doesn't do anything that _should_ cause corruption or data loss;
however, [btrfs has kernel bugs](btrfs-kernel.md) and [interacts poorly however, [btrfs has kernel bugs](btrfs-kernel.md), so corruption is
with some Linux block device layers](btrfs-other.md), so corruption is
not impossible. not impossible.
Issues with the btrfs filesystem kernel code or other block device layers Issues with the btrfs filesystem kernel code or other block device layers