docs: simplify the exit-with-SIGTERM description

The description now matches the code again. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2026-01-08 20:00:22 +01:00 · 2023-02-25 03:10:29 -05:00
parent f21569e88c
commit c354e77634
2 changed files with 40 additions and 70 deletions
--- a/docs/config.md
+++ b/docs/config.md
@@ -8,9 +8,10 @@ are reasonable in most cases.
 Hash Table Sizing
 -----------------
-Hash table entries are 16 bytes per data block.  The hash table stores
+Hash table entries are 16 bytes per data block.  The hash table stores the
-the most recently read unique hashes.  Once the hash table is full,
+most recently read unique hashes.  Once the hash table is full, each new
-each new entry in the table evicts an old entry.
+entry added to the table evicts an old entry.  This makes the hash table
 a sliding window over the most recently scanned data from the filesystem.
 Here are some numbers to estimate appropriate hash table sizes:
@@ -25,9 +26,11 @@ Here are some numbers to estimate appropriate hash table sizes:
 Notes:
 * If the hash table is too large, no extra dedupe efficiency is
-obtained, and the extra space just wastes RAM.  Extra space can also slow
+obtained, and the extra space wastes RAM.  If the hash table contains
-bees down by preventing old data from being evicted, so bees wastes time
+more block records than there are blocks in the filesystem, the extra
-looking for matching data that is no longer present on the filesystem.
+space can slow bees down.  A table that is too large prevents obsolete
 data from being evicted, so bees wastes time looking for matching data
 that is no longer present on the filesystem.
 * If the hash table is too small, bees extrapolates from matching
 blocks to find matching adjacent blocks in the filesystem that have been
@@ -36,6 +39,10 @@ one block in common between two extents in order to be able to dedupe
 the entire extents.  This provides significantly more dedupe hit rate
 per hash table byte than other dedupe tools.
 * There is a fairly wide range of usable hash sizes, and performances
 degrades according to a smooth probabilistic curve in both directions.
 Double or half the optimium size usually works just as well.
 * When counting unique data in compressed data blocks to estimate
 optimum hash table size, count the *uncompressed* size of the data.
@@ -66,11 +73,11 @@ data on an uncompressed filesystem.  Dedupe efficiency falls dramatically
 with hash tables smaller than 128MB/TB as the average dedupe extent size
 is larger than the largest possible compressed extent size (128KB).
-* **Short writes** also shorten the average extent length and increase
+* **Short writes or fragmentation** also shorten the average extent
-optimum hash table size.  If a database writes to files randomly using
+length and increase optimum hash table size.  If a database writes to
-4K page writes, all of these extents will be 4K in length, and the hash
+files randomly using 4K page writes, all of these extents will be 4K
-table size must be increased to retain each one (or the user must accept
+in length, and the hash table size must be increased to retain each one
-a lower dedupe hit rate).
+(or the user must accept a lower dedupe hit rate).
   Defragmenting files that have had many short writes increases the
 extent length and therefore reduces the optimum hash table size.
--- a/docs/gotchas.md
+++ b/docs/gotchas.md
@@ -51,73 +51,36 @@ loops early.  The exception text in this case is:
 Terminating bees with SIGTERM
 -----------------------------
-bees is designed to survive host crashes, so it is safe to terminate
+bees is designed to survive host crashes, so it is safe to terminate bees
-bees using SIGKILL; however, when bees next starts up, it will repeat
+using SIGKILL; however, when bees next starts up, it will repeat some
-some work that was performed between the last bees crawl state save point
+work that was performed between the last bees crawl state save point
-and the SIGKILL (up to 15 minutes).  If bees is stopped and started less
+and the SIGKILL (up to 15 minutes), and a large hash table may not be
-than once per day, then this is not a problem as the proportional impact
+completely written back to disk, so some duplicate matches will be lost.
 is quite small; however, users who stop and start bees daily or even
 more often may prefer to have a clean shutdown with SIGTERM so bees can
 restart faster.
-bees handling of SIGTERM can take a long time on machines with some or
+If bees is stopped and started less than once per week, then this is not
-all of:
+a problem as the proportional impact is quite small; however, users who
 stop and start bees daily or even more often may prefer to have a clean
 shutdown with SIGTERM so bees can restart faster.
-   * Large RAM and `vm.dirty_ratio`
+The shutdown procedure performs these steps:
   * Large number of active bees worker threads
   * Large number of bees temporary files (proportional to thread count)
   * Large hash table size
   * Large filesystem size
   * High IO latency, especially "low power" spinning disks
   * High filesystem activity, especially duplicate data writes
-Each of these factors individually increases the total time required
+   1.  Crawl state is saved to `$BEESHOME`.  This is the most
 to perform a clean bees shutdown.  When combined, the factors can
 multiply with each other, dramatically increasing the time required to
 flush bees state to disk.
 On a large system with many of the above factors present, a "clean"
 bees shutdown can take more than 20 minutes.  Even a small machine
 (16GB RAM, 1GB hash table, 1TB NVME disk) can take several seconds to
 complete a SIGTERM shutdown.
 The shutdown procedure performs potentially long-running tasks in
 this order:
   1.  Worker threads finish executing their current Task and exit.
       Threads executing `LOGICAL_INO` ioctl calls usually finish quickly,
       but btrfs imposes no limit on the ioctl's running time, so it
       can take several minutes in rare bad cases.  If there is a btrfs
       commit already in progress on the filesystem, then most worker
       threads will be blocked until the btrfs commit is finished.
   2.  Crawl state is saved to `$BEESHOME`.  This normally completes
       relatively quickly (a few seconds at most).  This is the most
       important bees state to save to disk as it directly impacts
-       restart time, so it is done as early as possible (but no earlier).
+       restart time, so it is done as early as possible
-   3.  Hash table is written to disk.  Normally the hash table is
+   2.  Hash table is written to disk.  Normally the hash table is
-       trickled back to disk at a rate of about 2GB per hour;
+       trickled back to disk at a rate of about 128KiB per second;
       however, SIGTERM causes bees to attempt to flush the whole table
-       immediately.  If bees has recently been idle then the hash table is
+       immediately.  The time spent here depends on the size of RAM, speed
-       likely already flushed to disk, so this step will finish quickly;
+       of disks, and aggressiveness of competing filesystem workloads.
-       however, if bees has recently been active and the hash table is
+       It can trigger `vm.dirty_bytes` limits and block other processes
-       large relative to RAM size, the blast of rapidly written data
+       writing to the filesystem for a while.
       can force the Linux VFS to block all writes to the filesystem
       for sufficient time to complete all pending btrfs metadata
       writes which accumulated during the btrfs commit before bees
       received SIGTERM...and _then_ let bees write out the hash table.
       The time spent here depends on the size of RAM, speed of disks,
       and aggressiveness of competing filesystem workloads.
-   4.  bees temporary files are closed, which implies deletion of their
+   3.  The bees process calls `_exit`, which terminates all running
-       inodes.  These are files which consist entirely of shared extent
+       worker threads, closes and deletes all temporary files.  This
-       structures, and btrfs takes an unusually long time to delete such
+       can take a while _after_ the bees process exits, especially on
-       files (up to a few minutes for each on slow spinning disks).
+       slow spinning disks.
 If bees is terminated with SIGKILL, only step #1 and #4 are performed (the
 kernel performs these automatically if bees exits).  This reduces the
 shutdown time at the cost of increased startup time.
 Balances
 --------