docs: add vmalloc bug to kernel bugs list

The bug is: v6.3-rc6: f349b15e183d mm: vmalloc: avoid warn_alloc noise caused by fatal signal The fixes are: v6.4: 95a301eefa82 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails v6.3.10: c189994b5dd3 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails The bug has been backported to LTS, but the fix has not: v6.2.11: 61334bc29781 mm: vmalloc: avoid warn_alloc noise caused by fatal signal v6.1.24: ef6bd8f64ce0 mm: vmalloc: avoid warn_alloc noise caused by fatal signal v5.15.107: a184df0de132 mm: vmalloc: avoid warn_alloc noise caused by fatal signal Signed-off-by: Zygo Blaxell <bees@furryterror.org>
context: log when LOGICAL_INO returns 0 refs
2025-08-02 22:03:29 +02:00 · 2023-07-06 13:50:12 -04:00 · 2023-07-06 12:54:33 -04:00 · 2023-07-06 12:49:36 -04:00 · 2023-07-06 12:49:36 -04:00 · 2023-05-07 21:24:21 -04:00
14 changed files with 148 additions and 147 deletions
--- a/README.md
+++ b/README.md
@@ -17,7 +17,6 @@ Strengths
 * Space-efficient hash table and matching algorithms - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB)
 * Daemon incrementally dedupes new data using btrfs tree search
 * Works with btrfs compression - dedupe any combination of compressed and uncompressed files
- * **NEW** [Works around `btrfs send` problems with dedupe and incremental parent snapshots](docs/options.md)
 * Works around btrfs filesystem structure to free more disk space
 * Persistent hash table for rapid restart after shutdown
 * Whole-filesystem dedupe - including snapshots
@@ -70,6 +69,6 @@ You can also use Github:
 Copyright & License
 -------------------

-Copyright 2015-2022 Zygo Blaxell <bees@furryterror.org>.
+Copyright 2015-2023 Zygo Blaxell <bees@furryterror.org>.

 GPL (version 3 or later).
--- a/docs/btrfs-kernel.md
+++ b/docs/btrfs-kernel.md
@@ -7,23 +7,24 @@ First, a warning that is not specific to bees:
 severe regression that can lead to fatal metadata corruption.**
 This issue is fixed in kernel 5.4.14 and later.

-**Recommended kernel versions for bees are 4.19, 5.4, 5.10, 5.11, or 5.12,
-with recent LTS and -stable updates.**  The latest released kernel as
-of this writing is 5.18.18.
+**Recommended kernel versions for bees are 4.19, 5.4, 5.10, 5.11, 5.15,
+6.0, or 6.1, with recent LTS and -stable updates.**  The latest released
+kernel as of this writing is 6.4.1.

-4.14, 4.9, and 4.4 LTS kernels with recent updates are OK with
-some issues.  Older kernels will be slower (a little slower or a lot
-slower depending on which issues are triggered).  Not all fixes are
-backported.
+4.14, 4.9, and 4.4 LTS kernels with recent updates are OK with some
+issues.  Older kernels will be slower (a little slower or a lot slower
+depending on which issues are triggered).  Not all fixes are backported.

 Obsolete non-LTS kernels have a variety of unfixed issues and should
 not be used with btrfs.  For details see the table below.

 bees requires btrfs kernel API version 4.2 or higher, and does not work
-on older kernels.
+at all on older kernels.

-bees will detect and use btrfs kernel API up to version 4.15 if present.
-In some future bees release, this API version may become mandatory.
+Some bees features rely on kernel 4.15 to work, and these features will
+not be available on older kernels.  Currently, bees is still usable on
+older kernels with degraded performance or with options disabled, but
+support for older kernels may be removed.



@@ -58,14 +59,17 @@ These bugs are particularly popular among bees users, though not all are specifi
 | - | 5.8 | deadlock in `TREE_SEARCH` ioctl (core component of bees filesystem scanner), followed by regression in deadlock fix | 4.4.237, 4.9.237, 4.14.199, 4.19.146, 5.4.66, 5.8.10 and later | a48b73eca4ce btrfs: fix potential deadlock in the search ioctl, 1c78544eaa46 btrfs: fix wrong address when faulting in pages in the search ioctl
 | 5.7 | 5.10 | kernel crash if balance receives fatal signal e.g. Ctrl-C | 5.4.93, 5.10.11, 5.11 and later | 18d3bff411c8 btrfs: don't get an EINTR during drop_snapshot for reloc
 | 5.10 | 5.10 | 20x write performance regression | 5.10.8, 5.11 and later | e076ab2a2ca7 btrfs: shrink delalloc pages instead of full inodes
-| 5.4 | 5.11 | spurious tree checker failures on extent ref hash | 5.11.5, 5.12 and later | 1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match
+| 5.4 | 5.11 | spurious tree checker failures on extent ref hash | 5.4.125, 5.10.43, 5.11.5, 5.12 and later | 1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match
 | - | 5.11 | tree mod log issue #5 | 4.4.263, 4.9.263, 4.14.227, 4.19.183, 5.4.108, 5.10.26, 5.11.9, 5.12 and later | dbcc7d57bffc btrfs: fix race when cloning extent buffer during rewind of an old root
 | - | 5.12 | tree mod log issue #6 | 4.14.233, 4.19.191, 5.4.118, 5.10.36, 5.11.20, 5.12.3, 5.13 and later | f9690f426b21 btrfs: fix race when picking most recent mod log operation for an old root
 | 4.15 | 5.16 | spurious warnings from `fs/fs-writeback.c` when `flushoncommit` is enabled | 5.15.27, 5.16.13, 5.17 and later | a0f0cf8341e3 btrfs: get rid of warning on transaction commit when using flushoncommit
 | - | 5.17 | crash during device removal can make filesystem unmountable | 5.15.54, 5.16.20, 5.17.3, 5.18 and later | bbac58698a55 btrfs: remove device item and update super block in the same transaction
 | - | 5.18 | wrong superblock num_devices makes filesystem unmountable | 4.14.283, 4.19.247, 5.4.198, 5.10.121, 5.15.46, 5.17.14, 5.18.3, 5.19 and later | d201238ccd2f btrfs: repair super block num_devices automatically
 | 5.18 | 5.19 | parent transid verify failed during log tree replay after a crash during a rename operation | 5.18.18, 5.19.2, 6.0 and later | 723df2bcc9e1 btrfs: join running log transaction when logging new name
-| 5.4 | - | kernel hang when multiple threads are running `LOGICAL_INO` and dedupe ioctl | - | workaround: reduce bees thread count to 1 with `-c1`
+| 5.12 | 6.0 | space cache corruption and potential double allocations | 5.15.65, 5.19.6, 6.0 and later | ced8ecf026fd btrfs: fix space cache corruption and potential double allocations
+| 6.3, backported to 5.15.107, 6.1.24, 6.2.11 | 6.3 | vmalloc error, failed to allocate pages | 6.3.10, 6.4 and later.  Bug (f349b15e183d "mm: vmalloc: avoid warn_alloc noise caused by fatal signal" in v6.3-rc6) backported to 6.1.24, 6.2.11, and 5.15.107. | 95a301eefa82 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
+| 6.2 | 6.3 | `IGNORE_OFFSET` flag ignored in `LOGICAL_INO` ioctl | 6.2.16, 6.3.3, 6.4 and later | 0cad8f14d70c btrfs: fix backref walking not returning all inode refs
+| 5.4 | - | kernel hang when multiple threads are running `LOGICAL_INO` and dedupe ioctl on the same extent | - | workaround: avoid doing that

 "Last bad kernel" refers to that version's last stable update from
 kernel.org.  Distro kernels may backport additional fixes.  Consult
@@ -80,21 +84,45 @@ through 5.4.13 inclusive.
 A "-" for "first bad kernel" indicates the bug has been present since
 the relevant feature first appeared in btrfs.

-A "-" for "last bad kernel" indicates the bug has not yet been fixed as
-of 5.18.18.
+A "-" for "last bad kernel" indicates the bug has not yet been fixed in
+current kernels (see top of this page for which kernel version that is).

 In cases where issues are fixed by commits spread out over multiple
 kernel versions, "fixed kernel version" refers to the version that
-contains all components of the fix.
+contains the last committed component of the fix.


 Workarounds for known kernel bugs
 ---------------------------------

-* **Hangs with high worker thread counts**:  On kernels newer than
-  5.4, multiple threads running `LOGICAL_INO` and dedupe ioctls
-  at the same time can lead to a kernel hang.  The workaround is
-  to reduce the thread count to 1 with `-c1`.
+* **Hangs with concurrent `LOGICAL_INO` and dedupe**:  on all
+  kernel versions so far, multiple threads running `LOGICAL_INO`
+  and dedupe ioctls at the same time on the same inodes or extents
+  can lead to a kernel hang.  The kernel enters an infinite loop in
+  `add_all_parents`, where `count` is 0, `ref->count` is 1, and
+  `btrfs_next_item` or `btrfs_next_old_item` never find a matching ref).
+
+  bees has two workarounds for this bug: 1. schedule work so that multiple
+  threads do not simultaneously access the same inode or the same extent,
+  and 2. use a brute-force global lock within bees that prevents any
+  thread from running `LOGICAL_INO` while any other thread is running
+  dedupe.
+
+  Workaround #1 isn't really a workaround, since we want to do the same
+  thing for unrelated performance reasons.  If multiple threads try to
+  perform dedupe operations on the same extent or inode, btrfs will make
+  all the threads wait for the same locks anyway, so it's better to have
+  bees find some other inode or extent to work on while waiting for btrfs
+  to finish.
+
+  Workaround #2 doesn't seem to be needed after implementing workaround
+  #1, but it's better to be slightly slower than to hang one CPU core
+  and the filesystem until the kernel is rebooted.
+
+  It is still theoretically possible to trigger the kernel bug when
+  running bees at the same time as other dedupers, or other programs
+  that use `LOGICAL_INO` like `btdu`; however, it's extremely difficult
+  to reproduce the bug without closely cooperating threads.

 * **Slow backrefs** (aka toxic extents):  Under certain conditions,
  if the number of references to a single shared extent grows too
@@ -110,8 +138,8 @@ Workarounds for known kernel bugs
  at this time of writing only bees has a workaround for this bug.

  This workaround is less necessary for kernels 5.4.96, 5.7 and later,
-  though it can still take 2 ms of CPU to resolve each extent ref on a
-  fast machine on a large, heavily fragmented file.
+  though the bees workaround can still be triggered on newer kernels
+  by changes in btrfs since kernel version 5.1.

 * **dedupe breaks `btrfs send` in old kernels**.  The bees option
  `--workaround-btrfs-send` prevents any modification of read-only subvols
@@ -127,8 +155,6 @@ Workarounds for known kernel bugs
 Unfixed kernel bugs
 -------------------

-As of 5.18.18:
-
 * **The kernel does not permit `btrfs send` and dedupe to run at the
  same time**.  Recent kernels no longer crash, but now refuse one
  operation with an error if the other operation was already running.
--- a/docs/btrfs-other.md
+++ b/docs/btrfs-other.md
@@ -8,44 +8,35 @@ bees has been tested in combination with the following:
 * HOLE extents and btrfs no-holes feature
 * Other deduplicators, reflink copies (though bees may decide to redo their work)
 * btrfs snapshots and non-snapshot subvols (RW and RO)
-* Concurrent file modification (e.g. PostgreSQL and sqlite databases, build daemons)
-* all btrfs RAID profiles
+* Concurrent file modification (e.g. PostgreSQL and sqlite databases, VMs, build daemons)
+* All btrfs RAID profiles
 * IO errors during dedupe (read errors will throw exceptions, bees will catch them and skip over the affected extent)
-* Filesystems mounted *with* the flushoncommit option ([lots of harmless kernel log warnings on 4.15 and later](btrfs-kernel.md))
-* Filesystems mounted *without* the flushoncommit option
+* Filesystems mounted with or without the `flushoncommit` option
 * 4K filesystem data block size / clone alignment
 * 64-bit and 32-bit LE host CPUs (amd64, x86, arm)
-* Huge files (>1TB--although Btrfs performance on such files isn't great in general)
-* filesystems up to 30T+ bytes, 100M+ files
+* Large files (kernel 5.4 or later strongly recommended)
+* Filesystems up to 90T+ bytes, 1000M+ files
 * btrfs receive
 * btrfs nodatacow/nodatasum inode attribute or mount option (bees skips all nodatasum files)
 * open(O_DIRECT) (seems to work as well--or as poorly--with bees as with any other btrfs feature)
-* lvmcache:  no problems observed in testing with recent kernels or reported by users in the last year.
+* lvm dm-cache, writecache

 Bad Btrfs Feature Interactions
 ------------------------------

 bees has been tested in combination with the following, and various problems are known:

-* bcache:  no data-losing problems observed in testing with recent kernels
-  or reported by users in the last year.  Some issues observed with
-  bcache interacting badly with some SSD models' firmware, but so far
-  this only causes temporary loss of service, not filesystem damage.
-  This behavior does not seem to be specific to bees (ordinary filesystem
-  tests with rsync and snapshots will reproduce it), but it does prevent
-  any significant testing of bees on bcache.
-
-* btrfs send:  there are bugs in `btrfs send` that can be triggered by bees.
-  The [`--workaround-btrfs-send` option](options.md) works around this issue
-  by preventing bees from modifying read-only snapshots.
+* btrfs send:  there are bugs in `btrfs send` that can be triggered by
+  bees on old kernels.  The [`--workaround-btrfs-send` option](options.md)
+  works around this issue by preventing bees from modifying read-only
+  snapshots.

 * btrfs qgroups:  very slow, sometimes hangs...and it's even worse when
  bees is running.

-* btrfs autodefrag mount option:  hangs and high CPU usage problems
-  reported by users.  bees cannot distinguish autodefrag activity from
-  normal filesystem activity and will likely try to undo the autodefrag
-  if duplicate copies of the defragmented data exist.
+* btrfs autodefrag mount option:  bees cannot distinguish autodefrag
+  activity from normal filesystem activity, and may try to undo the
+  autodefrag if duplicate copies of the defragmented data exist.

 Untested Btrfs Feature Interactions
 -----------------------------------
@@ -54,9 +45,10 @@ bees has not been tested with the following, and undesirable interactions may oc

 * Non-4K filesystem data block size (should work if recompiled)
 * Non-equal hash (SUM) and filesystem data block (CLONE) sizes (need to fix that eventually)
-* btrfs seed filesystems (does anyone even use those?)
-* btrfs out-of-tree kernel patches (e.g. in-kernel dedupe or encryption)
+* btrfs seed filesystems (no particular reason it wouldn't work, but no one has reported trying)
+* btrfs out-of-tree kernel patches (e.g. in-kernel dedupe, encryption, extent tree v2)
 * btrfs-convert from ext2/3/4 (never tested, might run out of space or ignore significant portions of the filesystem due to sanity checks)
 * btrfs mixed block groups (don't know a reason why it would *not* work, but never tested)
-* flashcache: an out-of-tree cache-HDD-on-SSD block layer helper.
 * Host CPUs with exotic page sizes, alignment requirements, or endianness (ppc, alpha, sparc, strongarm, s390, mips, m68k...)
+* bcache: used to be in the "bad" list, now in the "untested" list because nobody is rigorously testing, and bcache bugs come and go
+* flashcache: an out-of-tree cache-HDD-on-SSD block layer helper
--- a/docs/config.md
+++ b/docs/config.md
@@ -8,9 +8,10 @@ are reasonable in most cases.
 Hash Table Sizing
 -----------------

-Hash table entries are 16 bytes per data block.  The hash table stores
-the most recently read unique hashes.  Once the hash table is full,
-each new entry in the table evicts an old entry.
+Hash table entries are 16 bytes per data block.  The hash table stores the
+most recently read unique hashes.  Once the hash table is full, each new
+entry added to the table evicts an old entry.  This makes the hash table
+a sliding window over the most recently scanned data from the filesystem.

 Here are some numbers to estimate appropriate hash table sizes:

@@ -25,9 +26,11 @@ Here are some numbers to estimate appropriate hash table sizes:
 Notes:

 * If the hash table is too large, no extra dedupe efficiency is
-obtained, and the extra space just wastes RAM.  Extra space can also slow
-bees down by preventing old data from being evicted, so bees wastes time
-looking for matching data that is no longer present on the filesystem.
+obtained, and the extra space wastes RAM.  If the hash table contains
+more block records than there are blocks in the filesystem, the extra
+space can slow bees down.  A table that is too large prevents obsolete
+data from being evicted, so bees wastes time looking for matching data
+that is no longer present on the filesystem.

 * If the hash table is too small, bees extrapolates from matching
 blocks to find matching adjacent blocks in the filesystem that have been
@@ -36,6 +39,10 @@ one block in common between two extents in order to be able to dedupe
 the entire extents.  This provides significantly more dedupe hit rate
 per hash table byte than other dedupe tools.

+ * There is a fairly wide range of usable hash sizes, and performances
+degrades according to a smooth probabilistic curve in both directions.
+Double or half the optimium size usually works just as well.
+
 * When counting unique data in compressed data blocks to estimate
 optimum hash table size, count the *uncompressed* size of the data.

@@ -66,11 +73,11 @@ data on an uncompressed filesystem.  Dedupe efficiency falls dramatically
 with hash tables smaller than 128MB/TB as the average dedupe extent size
 is larger than the largest possible compressed extent size (128KB).

-* **Short writes** also shorten the average extent length and increase
-optimum hash table size.  If a database writes to files randomly using
-4K page writes, all of these extents will be 4K in length, and the hash
-table size must be increased to retain each one (or the user must accept
-a lower dedupe hit rate).
+* **Short writes or fragmentation** also shorten the average extent
+length and increase optimum hash table size.  If a database writes to
+files randomly using 4K page writes, all of these extents will be 4K
+in length, and the hash table size must be increased to retain each one
+(or the user must accept a lower dedupe hit rate).

   Defragmenting files that have had many short writes increases the
 extent length and therefore reduces the optimum hash table size.
--- a/docs/event-counters.md
+++ b/docs/event-counters.md
@@ -296,6 +296,7 @@ resolve

 The `resolve` event group consists of operations related to translating a btrfs virtual block address (i.e. physical block address) to a `(root, inode, offset)` tuple (i.e. locating and opening the file containing a matching block).  `resolve` is the top level, `chase` and `adjust` are the lower two levels.

+ * `resolve_empty`: The `LOGICAL_INO` ioctl returned successfully with an empty reference list (0 items).
 * `resolve_fail`: The `LOGICAL_INO` ioctl returned an error.
 * `resolve_large`: The `LOGICAL_INO` ioctl returned more than 2730 results (the limit of the v1 ioctl).
 * `resolve_ms`: Total time spent in the `LOGICAL_INO` ioctl (i.e. wallclock time, not kernel CPU time).
--- a/docs/gotchas.md
+++ b/docs/gotchas.md
@@ -51,81 +51,40 @@ loops early.  The exception text in this case is:
 Terminating bees with SIGTERM
 -----------------------------

-bees is designed to survive host crashes, so it is safe to terminate
-bees using SIGKILL; however, when bees next starts up, it will repeat
-some work that was performed between the last bees crawl state save point
-and the SIGKILL (up to 15 minutes).  If bees is stopped and started less
-than once per day, then this is not a problem as the proportional impact
-is quite small; however, users who stop and start bees daily or even
-more often may prefer to have a clean shutdown with SIGTERM so bees can
-restart faster.
+bees is designed to survive host crashes, so it is safe to terminate bees
+using SIGKILL; however, when bees next starts up, it will repeat some
+work that was performed between the last bees crawl state save point
+and the SIGKILL (up to 15 minutes), and a large hash table may not be
+completely written back to disk, so some duplicate matches will be lost.

-bees handling of SIGTERM can take a long time on machines with some or
-all of:
+If bees is stopped and started less than once per week, then this is not
+a problem as the proportional impact is quite small; however, users who
+stop and start bees daily or even more often may prefer to have a clean
+shutdown with SIGTERM so bees can restart faster.

-   * Large RAM and `vm.dirty_ratio`
-   * Large number of active bees worker threads
-   * Large number of bees temporary files (proportional to thread count)
-   * Large hash table size
-   * Large filesystem size
-   * High IO latency, especially "low power" spinning disks
-   * High filesystem activity, especially duplicate data writes
+The shutdown procedure performs these steps:

-Each of these factors individually increases the total time required
-to perform a clean bees shutdown.  When combined, the factors can
-multiply with each other, dramatically increasing the time required to
-flush bees state to disk.
-
-On a large system with many of the above factors present, a "clean"
-bees shutdown can take more than 20 minutes.  Even a small machine
-(16GB RAM, 1GB hash table, 1TB NVME disk) can take several seconds to
-complete a SIGTERM shutdown.
-
-The shutdown procedure performs potentially long-running tasks in
-this order:
-
-   1.  Worker threads finish executing their current Task and exit.
-       Threads executing `LOGICAL_INO` ioctl calls usually finish quickly,
-       but btrfs imposes no limit on the ioctl's running time, so it
-       can take several minutes in rare bad cases.  If there is a btrfs
-       commit already in progress on the filesystem, then most worker
-       threads will be blocked until the btrfs commit is finished.
-
-   2.  Crawl state is saved to `$BEESHOME`.  This normally completes
-       relatively quickly (a few seconds at most).  This is the most
+   1.  Crawl state is saved to `$BEESHOME`.  This is the most
       important bees state to save to disk as it directly impacts
-       restart time, so it is done as early as possible (but no earlier).
+       restart time, so it is done as early as possible

-   3.  Hash table is written to disk.  Normally the hash table is
-       trickled back to disk at a rate of about 2GB per hour;
+   2.  Hash table is written to disk.  Normally the hash table is
+       trickled back to disk at a rate of about 128KiB per second;
       however, SIGTERM causes bees to attempt to flush the whole table
-       immediately.  If bees has recently been idle then the hash table is
-       likely already flushed to disk, so this step will finish quickly;
-       however, if bees has recently been active and the hash table is
-       large relative to RAM size, the blast of rapidly written data
-       can force the Linux VFS to block all writes to the filesystem
-       for sufficient time to complete all pending btrfs metadata
-       writes which accumulated during the btrfs commit before bees
-       received SIGTERM...and _then_ let bees write out the hash table.
-       The time spent here depends on the size of RAM, speed of disks,
-       and aggressiveness of competing filesystem workloads.
+       immediately.  The time spent here depends on the size of RAM, speed
+       of disks, and aggressiveness of competing filesystem workloads.
+       It can trigger `vm.dirty_bytes` limits and block other processes
+       writing to the filesystem for a while.

-   4.  bees temporary files are closed, which implies deletion of their
-       inodes.  These are files which consist entirely of shared extent
-       structures, and btrfs takes an unusually long time to delete such
-       files (up to a few minutes for each on slow spinning disks).
+   3.  The bees process calls `_exit`, which terminates all running
+       worker threads, closes and deletes all temporary files.  This
+       can take a while _after_ the bees process exits, especially on
+       slow spinning disks.

-If bees is terminated with SIGKILL, only step #1 and #4 are performed (the
-kernel performs these automatically if bees exits).  This reduces the
-shutdown time at the cost of increased startup time.

 Balances
 --------

-First, read [`LOGICAL_INO` and btrfs balance WARNING](btrfs-kernel.md).
-bees will suspend operations during a btrfs balance to work around
-kernel bugs.
-
 A btrfs balance relocates data on disk by making a new copy of the
 data, replacing all references to the old data with references to the
 new copy, and deleting the old copy.  To bees, this is the same as any
@@ -175,7 +134,9 @@ the beginning.

 Each time bees dedupes an extent that is referenced by a snapshot,
 the entire metadata page in the snapshot subvol (16KB by default) must
-be CoWed in btrfs.  This can result in a substantial increase in btrfs
+be CoWed in btrfs.  Since all references must be removed at the same
+time, this CoW operation is repeated in every snapshot containing the
+duplicate data.  This can result in a substantial increase in btrfs
 metadata size if there are many snapshots on a filesystem.

 Normally, metadata is small (less than 1% of the filesystem) and dedupe
@@ -252,17 +213,18 @@ Other Gotchas
  filesystem while `LOGICAL_INO` is running.  Generally the CPU spends
  most of the runtime of the `LOGICAL_INO` ioctl running the kernel,
  so on a single-core CPU the entire system can freeze up for a second
-  during operations on toxic extents.
+  during operations on toxic extents.  Note this only occurs on older
+  kernels.  See [the slow backrefs kernel bug section](btrfs-kernel.md).

 * If a process holds a directory FD open, the subvol containing the
  directory cannot be deleted (`btrfs sub del` will start the deletion
  process, but it will not proceed past the first open directory FD).
  `btrfs-cleaner` will simply skip over the directory *and all of its
  children* until the FD is closed.  bees avoids this gotcha by closing
-  all of the FDs in its directory FD cache every 10 btrfs transactions.
+  all of the FDs in its directory FD cache every btrfs transaction.

 * If a file is deleted while bees is caching an open FD to the file,
  bees continues to scan the file.  For very large files (e.g. VM
  images), the deletion of the file can be delayed indefinitely.
  To limit this delay, bees closes all FDs in its file FD cache every
-  10 btrfs transactions.
+  btrfs transaction.
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@@ -8,10 +8,12 @@ bees uses checkpoints for persistence to eliminate the IO overhead of a
 transactional data store.  On restart, bees will dedupe any data that
 was added to the filesystem since the last checkpoint.  Checkpoints
 occur every 15 minutes for scan progress, stored in `beescrawl.dat`.
-The hash table trickle-writes to disk at 4GB/hour to `beeshash.dat`.
-An hourly performance report is written to `beesstats.txt`.  There are
-no special requirements for bees hash table storage--`.beeshome` could
-be stored on a different btrfs filesystem, ext4, or even CIFS.
+The hash table trickle-writes to disk at 128KiB/s to `beeshash.dat`,
+but will flush immediately if bees is terminated by SIGTERM.
+
+There are no special requirements for bees hash table storage--`.beeshome`
+could be stored on a different btrfs filesystem, ext4, or even CIFS (but
+not MS-DOS--beeshome does need filenames longer than 8.3).

 bees uses a persistent dedupe hash table with a fixed size configured
 by the user.  Any size of hash table can be dedicated to dedupe.  If a
@@ -20,7 +22,7 @@ small as 128KB.

 The bees hash table is loaded into RAM at startup and `mlock`ed so it
 will not be swapped out by the kernel (if swap is permitted, performance
-degrades to nearly zero).
+degrades to nearly zero, for both bees and the swap device).

 bees scans the filesystem in a single pass which removes duplicate
 extents immediately after they are detected.  There are no distinct
@@ -83,12 +85,12 @@ of these functions in userspace, at the expense of encountering [some
 kernel bugs in `LOGICAL_INO` performance](btrfs-kernel.md).

 bees uses only the data-safe `FILE_EXTENT_SAME` (aka `FIDEDUPERANGE`)
-kernel operations to manipulate user data, so it can dedupe live data
-(e.g. build servers, sqlite databases, VM disk images).  It does not
-modify file attributes or timestamps.
+kernel ioctl to manipulate user data, so it can dedupe live data
+(e.g. build servers, sqlite databases, VM disk images).  bees does not
+modify file attributes or timestamps in deduplicated files.

-When bees has scanned all of the data, bees will pause until 10
-transactions have been completed in the btrfs filesystem.  bees tracks
+When bees has scanned all of the data, bees will pause until a new
+transaction has completed in the btrfs filesystem.  bees tracks
 the current btrfs transaction ID over time so that it polls less often
 on quiescent filesystems and more often on busy filesystems.

--- a/docs/index.md
+++ b/docs/index.md
@@ -17,7 +17,6 @@ Strengths
 * Space-efficient hash table and matching algorithms - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB)
 * Daemon incrementally dedupes new data using btrfs tree search
 * Works with btrfs compression - dedupe any combination of compressed and uncompressed files
- * **NEW** [Works around `btrfs send` problems with dedupe and incremental parent snapshots](options.md)
 * Works around btrfs filesystem structure to free more disk space
 * Persistent hash table for rapid restart after shutdown
 * Whole-filesystem dedupe - including snapshots
@@ -70,6 +69,6 @@ You can also use Github:
 Copyright & License
 -------------------

-Copyright 2015-2022 Zygo Blaxell <bees@furryterror.org>.
+Copyright 2015-2023 Zygo Blaxell <bees@furryterror.org>.

 GPL (version 3 or later).
--- a/docs/install.md
+++ b/docs/install.md
@@ -4,7 +4,7 @@ Building bees
 Dependencies
 ------------

-* C++11 compiler (tested with GCC 4.9, 6.3.0, 8.1.0)
+* C++11 compiler (tested with GCC 8.1.0, 12.2.0)

  Sorry.  I really like closures and shared_ptr, so support
  for earlier compiler versions is unlikely.
@@ -19,7 +19,7 @@ Dependencies

 * [Linux kernel version](btrfs-kernel.md) gets its own page.

-* markdown for documentation
+* markdown to build the documentation

 * util-linux version that provides `blkid` command for the helper
  script `scripts/beesd` to work
--- a/docs/missing.md
+++ b/docs/missing.md
@@ -2,8 +2,8 @@ Features You Might Expect That bees Doesn't Have
 ------------------------------------------------

 * There's no configuration file (patches welcome!).  There are
-some tunables hardcoded in the source that could eventually become
-configuration options.  There's also an incomplete option parser
+some tunables hardcoded in the source (`src/bees.h`) that could eventually
+become configuration options.  There's also an incomplete option parser
 (patches welcome!).

 * The bees process doesn't fork and writes its log to stdout/stderr.
@@ -43,3 +43,6 @@ compression method or not compress the data (patches welcome!).
 * It is theoretically possible to resize the hash table without starting
 over with a new full-filesystem scan; however, this feature has not been
 implemented yet.
+
+* btrfs maintains csums of data blocks which bees could use to improve
+scan speeds, but bees doesn't use them yet.
--- a/lib/btrfs-tree.cc
+++ b/lib/btrfs-tree.cc
@@ -548,7 +548,7 @@ namespace crucible {
 	#endif
 		const uint64_t logical_end = logical + count * block_size();
 		BtrfsTreeItem bti = rlower_bound(logical);
-		size_t loops = 0;
+		size_t __attribute__((unused)) loops = 0;
 		BCTFGS_DEBUG("get_sums " << to_hex(logical) << ".." << to_hex(logical_end) << endl);
 		while (!!bti) {
 			BCTFGS_DEBUG("get_sums[" << loops << "]: " << bti << endl);
--- a/src/bees-context.cc
+++ b/src/bees-context.cc
@@ -821,6 +821,10 @@ BeesContext::resolve_addr_uncached(BeesAddress addr)

 	// Avoid performance problems - pretend resolve failed if there are too many refs
 	const size_t rv_count = log_ino.m_iors.size();
+	if (!rv_count) {
+		BEESLOGDEBUG("LOGICAL_INO returned 0 refs at " << to_hex(addr));
+		BEESCOUNT(resolve_empty);
+	}
 	if (rv_count < BEES_MAX_EXTENT_REF_COUNT) {
 		rv.m_biors = vector<BtrfsInodeOffsetRoot>(log_ino.m_iors.begin(), log_ino.m_iors.end());
 	} else {
@@ -832,7 +836,7 @@ BeesContext::resolve_addr_uncached(BeesAddress addr)
 	if (sys_usage_delta < BEES_TOXIC_SYS_DURATION) {
 		rv.m_is_toxic = false;
 	} else {
-		BEESLOGNOTICE("WORKAROUND: toxic address: addr = " << addr << ", sys_usage_delta = " << round(sys_usage_delta* 1000.0) / 1000.0 << ", user_usage_delta = " << round(user_usage_delta * 1000.0) / 1000.0 << ", rt_age = " << rt_age << ", refs " << rv_count);
+		BEESLOGDEBUG("WORKAROUND: toxic address: addr = " << addr << ", sys_usage_delta = " << round(sys_usage_delta* 1000.0) / 1000.0 << ", user_usage_delta = " << round(user_usage_delta * 1000.0) / 1000.0 << ", rt_age = " << rt_age << ", refs " << rv_count);
 		BEESCOUNT(resolve_toxic);
 		rv.m_is_toxic = true;
 	}
--- a/src/bees-roots.cc
+++ b/src/bees-roots.cc
@@ -515,7 +515,12 @@ BeesRoots::transid_max_nocache()
 uint64_t
 BeesRoots::transid_max()
 {
-	return m_transid_re.count();
+	const auto rv = m_transid_re.count();
+	// transid must be greater than zero, or we did something very wrong
+	THROW_CHECK1(runtime_error, rv, rv > 0);
+	// transid must be less than max, or we did something very wrong
+	THROW_CHECK1(runtime_error, rv, rv < numeric_limits<uint64_t>::max());
+	return rv;
 }

 struct BeesFileCrawl {
--- a/test/limits.cc
+++ b/test/limits.cc
@@ -3,6 +3,7 @@
 #include "crucible/limits.h"

 #include <cassert>
+#include <cstdint>

 using namespace crucible;
Author	SHA1	Message	Date
Zygo Blaxell	124507232f	docs: add vmalloc bug to kernel bugs list The bug is: v6.3-rc6: f349b15e183d mm: vmalloc: avoid warn_alloc noise caused by fatal signal The fixes are: v6.4: 95a301eefa82 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails v6.3.10: c189994b5dd3 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails The bug has been backported to LTS, but the fix has not: v6.2.11: 61334bc29781 mm: vmalloc: avoid warn_alloc noise caused by fatal signal v6.1.24: ef6bd8f64ce0 mm: vmalloc: avoid warn_alloc noise caused by fatal signal v5.15.107: a184df0de132 mm: vmalloc: avoid warn_alloc noise caused by fatal signal Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-07-06 13:50:12 -04:00
Zygo Blaxell	3c5e13c885	context: log when LOGICAL_INO returns 0 refs There was a bug in kernel 6.3 where LOGICAL_INO with IGNORE_OFFSET sometimes fails to ignore the offset. That bug is now fixed, but LOGICAL_INO still returns 0 refs much more often than seems appropriate. This is most likely because bees frequently deletes extents while there is still work waiting for them in Task queues. In this case, LOGICAL_INO correctly returns an empty list, because every reference to some extent is deleted, but the new extent tree with that extent removed is not yet committed in btrfs. Add a DEBUG-level log message and an event counter to track these events. In the absence of a kernel bug, the debug message may indicate CPU time was wasted performing a search whose outcome could have been predicted. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-07-06 12:54:33 -04:00
Zygo Blaxell	a6ca2fa2f6	docs: add IGNORE_OFFSET regression in 6.2..6.3 to kernel bugs list This doesn't impact the current bees master, but it does break bees-next. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-07-06 12:49:36 -04:00
Zygo Blaxell	3f23a0c73f	context: downgrade toxic extent workaround message Toxic extents are much less of a problem now than they were in kernels before 5.7. Downgrade the log message level to reflect their lesser importance. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-07-06 12:49:36 -04:00
Zygo Blaxell	d6732c58e2	test: GCC 13 fix for limits.cc GCC complains that #include <cstdint> is missing, so add that. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-05-07 21:24:21 -04:00
Zygo Blaxell	75b2067cef	btrfs-tree: fix build on clang++16 The "loops" variable isn't read (only set) if not built with extra debug code. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-05-07 21:23:27 -04:00
Zygo Blaxell	da3ef216b1	docs: working around `btrfs send` issues isn't really a feature The critical kernel bugs in send have been fixed for years. The limitations that remain aren't bugs, and bees has no sustainable workaround for them. Also update copyright year range. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-03-07 10:25:51 -05:00
Zygo Blaxell	b7665d49d9	docs: fill in missing LTS backports for "1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match" Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-03-07 10:17:44 -05:00
Zygo Blaxell	717bdf5eb5	roots: make sure transid_max's computed value isn't max We check the result of transid_max_nocache(), but not the result of transid_max(). The latter is a computed result that is even more likely to be wrong[citation needed]. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:45:29 -05:00
Zygo Blaxell	9b60f2b94d	docs: add "missing" features that have been in development for some time already Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:42:42 -05:00
Zygo Blaxell	8978d63e75	docs: update GCC versions list and clarify markdown statement I don't know if anyone else is testing GCC versions before 8.0 any more, but I'm not. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:39:55 -05:00
Zygo Blaxell	82474b4ef4	docs: update front page At least one user was significantly confused by "designed for large filesystems". The btrfs send workarounds aren't new any more. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:38:50 -05:00
Zygo Blaxell	73834beb5a	docs: minor changes to how-it-works based on past user questions Clarify that "too large" and "too small" are some distance away from each other. The Goldilocks zone is _wide_. The interval between cache drops is now shorter. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:37:37 -05:00
Zygo Blaxell	c92ba117d8	docs: various gotcha updates Fixing the obviously wrong and out of date stuff. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:37:23 -05:00
Zygo Blaxell	c354e77634	docs: simplify the exit-with-SIGTERM description The description now matches the code again. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:36:44 -05:00
Zygo Blaxell	f21569e88c	docs: update the feature interactions page Fixing the obviously out-of-date and no-longer-tested things. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:34:22 -05:00
Zygo Blaxell	3d5ebe4d40	docs: update kernel bugs and workarounds list for 6.2.0 Remove some of the repetition to make the document easier to edit. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:32:52 -05:00