diff --git a/docs/config.md b/docs/config.md index 98e8d1d..b214e70 100644 --- a/docs/config.md +++ b/docs/config.md @@ -8,9 +8,10 @@ are reasonable in most cases. Hash Table Sizing ----------------- -Hash table entries are 16 bytes per data block. The hash table stores -the most recently read unique hashes. Once the hash table is full, -each new entry in the table evicts an old entry. +Hash table entries are 16 bytes per data block. The hash table stores the +most recently read unique hashes. Once the hash table is full, each new +entry added to the table evicts an old entry. This makes the hash table +a sliding window over the most recently scanned data from the filesystem. Here are some numbers to estimate appropriate hash table sizes: @@ -25,9 +26,11 @@ Here are some numbers to estimate appropriate hash table sizes: Notes: * If the hash table is too large, no extra dedupe efficiency is -obtained, and the extra space just wastes RAM. Extra space can also slow -bees down by preventing old data from being evicted, so bees wastes time -looking for matching data that is no longer present on the filesystem. +obtained, and the extra space wastes RAM. If the hash table contains +more block records than there are blocks in the filesystem, the extra +space can slow bees down. A table that is too large prevents obsolete +data from being evicted, so bees wastes time looking for matching data +that is no longer present on the filesystem. * If the hash table is too small, bees extrapolates from matching blocks to find matching adjacent blocks in the filesystem that have been @@ -36,6 +39,10 @@ one block in common between two extents in order to be able to dedupe the entire extents. This provides significantly more dedupe hit rate per hash table byte than other dedupe tools. + * There is a fairly wide range of usable hash sizes, and performances +degrades according to a smooth probabilistic curve in both directions. +Double or half the optimium size usually works just as well. + * When counting unique data in compressed data blocks to estimate optimum hash table size, count the *uncompressed* size of the data. @@ -66,11 +73,11 @@ data on an uncompressed filesystem. Dedupe efficiency falls dramatically with hash tables smaller than 128MB/TB as the average dedupe extent size is larger than the largest possible compressed extent size (128KB). -* **Short writes** also shorten the average extent length and increase -optimum hash table size. If a database writes to files randomly using -4K page writes, all of these extents will be 4K in length, and the hash -table size must be increased to retain each one (or the user must accept -a lower dedupe hit rate). +* **Short writes or fragmentation** also shorten the average extent +length and increase optimum hash table size. If a database writes to +files randomly using 4K page writes, all of these extents will be 4K +in length, and the hash table size must be increased to retain each one +(or the user must accept a lower dedupe hit rate). Defragmenting files that have had many short writes increases the extent length and therefore reduces the optimum hash table size. diff --git a/docs/gotchas.md b/docs/gotchas.md index 93bfc5e..a60cac1 100644 --- a/docs/gotchas.md +++ b/docs/gotchas.md @@ -51,73 +51,36 @@ loops early. The exception text in this case is: Terminating bees with SIGTERM ----------------------------- -bees is designed to survive host crashes, so it is safe to terminate -bees using SIGKILL; however, when bees next starts up, it will repeat -some work that was performed between the last bees crawl state save point -and the SIGKILL (up to 15 minutes). If bees is stopped and started less -than once per day, then this is not a problem as the proportional impact -is quite small; however, users who stop and start bees daily or even -more often may prefer to have a clean shutdown with SIGTERM so bees can -restart faster. +bees is designed to survive host crashes, so it is safe to terminate bees +using SIGKILL; however, when bees next starts up, it will repeat some +work that was performed between the last bees crawl state save point +and the SIGKILL (up to 15 minutes), and a large hash table may not be +completely written back to disk, so some duplicate matches will be lost. -bees handling of SIGTERM can take a long time on machines with some or -all of: +If bees is stopped and started less than once per week, then this is not +a problem as the proportional impact is quite small; however, users who +stop and start bees daily or even more often may prefer to have a clean +shutdown with SIGTERM so bees can restart faster. - * Large RAM and `vm.dirty_ratio` - * Large number of active bees worker threads - * Large number of bees temporary files (proportional to thread count) - * Large hash table size - * Large filesystem size - * High IO latency, especially "low power" spinning disks - * High filesystem activity, especially duplicate data writes +The shutdown procedure performs these steps: -Each of these factors individually increases the total time required -to perform a clean bees shutdown. When combined, the factors can -multiply with each other, dramatically increasing the time required to -flush bees state to disk. - -On a large system with many of the above factors present, a "clean" -bees shutdown can take more than 20 minutes. Even a small machine -(16GB RAM, 1GB hash table, 1TB NVME disk) can take several seconds to -complete a SIGTERM shutdown. - -The shutdown procedure performs potentially long-running tasks in -this order: - - 1. Worker threads finish executing their current Task and exit. - Threads executing `LOGICAL_INO` ioctl calls usually finish quickly, - but btrfs imposes no limit on the ioctl's running time, so it - can take several minutes in rare bad cases. If there is a btrfs - commit already in progress on the filesystem, then most worker - threads will be blocked until the btrfs commit is finished. - - 2. Crawl state is saved to `$BEESHOME`. This normally completes - relatively quickly (a few seconds at most). This is the most + 1. Crawl state is saved to `$BEESHOME`. This is the most important bees state to save to disk as it directly impacts - restart time, so it is done as early as possible (but no earlier). + restart time, so it is done as early as possible - 3. Hash table is written to disk. Normally the hash table is - trickled back to disk at a rate of about 2GB per hour; + 2. Hash table is written to disk. Normally the hash table is + trickled back to disk at a rate of about 128KiB per second; however, SIGTERM causes bees to attempt to flush the whole table - immediately. If bees has recently been idle then the hash table is - likely already flushed to disk, so this step will finish quickly; - however, if bees has recently been active and the hash table is - large relative to RAM size, the blast of rapidly written data - can force the Linux VFS to block all writes to the filesystem - for sufficient time to complete all pending btrfs metadata - writes which accumulated during the btrfs commit before bees - received SIGTERM...and _then_ let bees write out the hash table. - The time spent here depends on the size of RAM, speed of disks, - and aggressiveness of competing filesystem workloads. + immediately. The time spent here depends on the size of RAM, speed + of disks, and aggressiveness of competing filesystem workloads. + It can trigger `vm.dirty_bytes` limits and block other processes + writing to the filesystem for a while. - 4. bees temporary files are closed, which implies deletion of their - inodes. These are files which consist entirely of shared extent - structures, and btrfs takes an unusually long time to delete such - files (up to a few minutes for each on slow spinning disks). + 3. The bees process calls `_exit`, which terminates all running + worker threads, closes and deletes all temporary files. This + can take a while _after_ the bees process exits, especially on + slow spinning disks. -If bees is terminated with SIGKILL, only step #1 and #4 are performed (the -kernel performs these automatically if bees exits). This reduces the -shutdown time at the cost of increased startup time. Balances --------