mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
README: update known bugs and issues list
Also split "bad feature interactions" into "unknown" (which is what it really was before) and "bad" (which includes some filesystem-destroying problems). Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
c3effe0a20
commit
e564d27dda
89
README.md
89
README.md
@ -152,13 +152,13 @@ Good Btrfs Feature Interactions
|
||||
|
||||
Bees has been tested in combination with the following:
|
||||
|
||||
* btrfs compression (either method), mixtures of compressed and uncompressed extents
|
||||
* btrfs compression (zlib, lzo, zstd), mixtures of compressed and uncompressed extents
|
||||
* PREALLOC extents (unconditionally replaced with holes)
|
||||
* HOLE extents and btrfs no-holes feature
|
||||
* Other deduplicators, reflink copies (though Bees may decide to redo their work)
|
||||
* btrfs snapshots and non-snapshot subvols (RW only)
|
||||
* btrfs snapshots and non-snapshot subvols (RW and RO)
|
||||
* Concurrent file modification (e.g. PostgreSQL and sqlite databases, build daemons)
|
||||
* all btrfs RAID profiles (people ask about this, but it's irrelevant)
|
||||
* all btrfs RAID profiles (people ask about this, but it's irrelevant to bees)
|
||||
* IO errors during dedup (read errors will throw exceptions, Bees will catch them and skip over the affected extent)
|
||||
* Filesystems mounted *with* the flushoncommit option
|
||||
* 4K filesystem data block size / clone alignment
|
||||
@ -166,25 +166,40 @@ Bees has been tested in combination with the following:
|
||||
* Large (>16M) extents
|
||||
* Huge files (>1TB--although Btrfs performance on such files isn't great in general)
|
||||
* filesystems up to 25T bytes, 100M+ files
|
||||
* btrfs read-only snapshots
|
||||
* btrfs receive
|
||||
* btrfs nodatacow/nodatasum inode attribute or mount option (bees skips all nodatasum files)
|
||||
* open(O_DIRECT) (seems to work as well--or as poorly--with bees as with any other btrfs feature)
|
||||
|
||||
Bad Btrfs Feature Interactions
|
||||
------------------------------
|
||||
|
||||
Bees has been tested in combination with the following, and various problems are known:
|
||||
|
||||
* bcache, lvmcache: *severe (filesystem-destroying) metadata corruption
|
||||
issues* observed in testing and reported by users, apparently only when
|
||||
used with bees. Plain SSD and HDD seem to be OK.
|
||||
* btrfs send: sometimes aborts with an I/O error when bees changes the
|
||||
data layout during a send. The send can be restarted and will work
|
||||
if bees has finished processing the snapshot being sent. No data
|
||||
corruption observed other than the truncated send.
|
||||
* btrfs qgroups: very slow, sometimes hangs
|
||||
* btrfs autodefrag mount option: hangs and high CPU usage problems
|
||||
reported by users. bees cannot distinguish autodefrag activity from
|
||||
normal filesystem activity and will likely try to undo the autodefrag,
|
||||
so it should probably be turned off for bees in any case.
|
||||
|
||||
Untested Btrfs Feature Interactions
|
||||
-----------------------------------
|
||||
|
||||
Bees has not been tested with the following, and undesirable interactions may occur:
|
||||
|
||||
* Non-4K filesystem data block size (should work if recompiled)
|
||||
* Non-equal hash (SUM) and filesystem data block (CLONE) sizes (probably never will work)
|
||||
* btrfs send/receive (receive is probably OK, but send could be confused?)
|
||||
* btrfs qgroups (never tested, no idea what might happen)
|
||||
* btrfs seed filesystems (does anyone even use those?)
|
||||
* btrfs autodefrag mount option (never tested, could fight with Bees)
|
||||
* btrfs nodatacow/nodatasum inode attribute or mount option (bees skips all nodatasum files)
|
||||
* btrfs out-of-tree kernel patches (e.g. in-band dedup or encryption)
|
||||
* btrfs-convert from ext2/3/4 (never tested, might run out of space or ignore significant portions of the filesystem due to sanity checks)
|
||||
* btrfs mixed block groups (don't know a reason why it would *not* work, but never tested)
|
||||
* open(O_DIRECT)
|
||||
* Filesystems mounted *without* the flushoncommit option
|
||||
* Filesystems mounted *without* the flushoncommit option (don't know the impact of crashes during dedup writes vs. ordinary writes)
|
||||
|
||||
Other Caveats
|
||||
-------------
|
||||
@ -251,7 +266,7 @@ in the future):
|
||||
|
||||
Bug fixes (sometimes included in older LTS kernels):
|
||||
|
||||
* Bugs fixed prior to 4.4.3 are not listed here.
|
||||
* Bugs fixed prior to 4.4.107 are not listed here.
|
||||
* 4.5: hang in the `INO_PATHS` ioctl used by Bees.
|
||||
* 4.5: use-after-free in the `FILE_EXTENT_SAME` ioctl used by Bees.
|
||||
* 4.6: lost inodes after a rename, crash, and log tree replay
|
||||
@ -264,10 +279,26 @@ Bug fixes (sometimes included in older LTS kernels):
|
||||
last one.
|
||||
* 4.14: backref performance improvements make LOGICAL_INO even faster
|
||||
in the worst cases (but possibly slower in the best cases?).
|
||||
* (unmerged): WARN_ON(ref->count < 0) in fs/btrfs/backref.c triggers
|
||||
* 4.14.29: WARN_ON(ref->count < 0) in fs/btrfs/backref.c triggers
|
||||
almost once per second. The WARN_ON is incorrect and can be removed.
|
||||
|
||||
Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
||||
Unfixed kernel bugs (as of 4.14.34) with workarounds in Bees:
|
||||
|
||||
* *Deadlocks* in the kernel dedup ioctl when files are modified
|
||||
immediately before dedup. `BeesTempFile::make_copy` calls `fsync()`
|
||||
immediately before dedup to work around this. If the `fsync()` is
|
||||
removed, the filesystem hangs within a few hours, requiring a reboot
|
||||
to recover. Even with the `fsync()`, it is possible to lose the
|
||||
kernel race condition and encounter a deadlock within a machine-year.
|
||||
VM image workloads may trigger this faster. Over the past years
|
||||
several specific deadlock cases have been fixed, but at least one
|
||||
remains.
|
||||
|
||||
* *Bad interactions* with other Linux block layers: bcache and lvmcache
|
||||
can fail spectacularly, and apparently only while running bees.
|
||||
This is definitely a kernel bug, either in btrfs or the lower block
|
||||
layers. Avoid using bees with these tools, or test very carefully
|
||||
before deployment.
|
||||
|
||||
* *slow backrefs* (aka toxic extents): If the number of references to a
|
||||
single shared extent within a single file grows above a few thousand,
|
||||
@ -276,7 +307,8 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
||||
measuring the time the kernel spends performing certain operations
|
||||
and permanently blacklisting any extent or hash where the kernel
|
||||
starts to get slow. Inside Bees, such blocks are marked as 'toxic'
|
||||
hash/block addresses.
|
||||
hash/block addresses. Linux kernel v4.14 is better but can still
|
||||
have problems.
|
||||
|
||||
* `LOGICAL_INO` output is arbitrarily limited to 2730 references
|
||||
even if more buffer space is provided for results. Once this number
|
||||
@ -299,19 +331,11 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
||||
list of all extent refs referencing a data extent (i.e. Bees wants
|
||||
the compressed-extent behavior in all cases). *Fixed in v4.14.*
|
||||
|
||||
* `LOGICAL_INO` was only called from one thread at any time per process.
|
||||
This means at most one core was irretrievably stuck in this ioctl.
|
||||
*Workaround removed in recent bees versions.*
|
||||
|
||||
* `FILE_EXTENT_SAME` is arbitrarily limited to 16MB. This is less than
|
||||
128MB which is the maximum extent size that can be created by defrag
|
||||
or prealloc. Bees avoids feedback loops this can generate while
|
||||
attempting to replace extents over 16MB in length.
|
||||
|
||||
* If the `fsync()` in `BeesTempFile::make_copy` is removed, the filesystem
|
||||
hangs within a few hours, requiring a reboot to recover. On the other
|
||||
hand, the `fsync()` only costs about 8% of overall performance.
|
||||
|
||||
Not really bugs, but gotchas nonetheless:
|
||||
|
||||
* If a process holds a directory FD open, the subvol containing the
|
||||
@ -385,12 +409,12 @@ Please also review the Makefile for additional hints.
|
||||
Dependencies
|
||||
------------
|
||||
|
||||
* C++11 compiler (tested with GCC 4.9 and 6.2.0)
|
||||
* C++11 compiler (tested with GCC 4.9, 6.2.0, 8.1.0)
|
||||
|
||||
Sorry. I really like closures and shared_ptr, so support
|
||||
for earlier compiler versions is unlikely.
|
||||
|
||||
* btrfs-progs (tested with 4.1..4.14)
|
||||
* btrfs-progs (tested with 4.1..4.15.1)
|
||||
|
||||
Needed for btrfs.h and ctree.h during compile.
|
||||
Not needed at runtime.
|
||||
@ -400,17 +424,16 @@ Dependencies
|
||||
This library is only required for a feature that was removed after v0.1.
|
||||
The lingering support code can be removed.
|
||||
|
||||
* Linux kernel version: minimum 4.4.3, 4.11 or later recommended
|
||||
* Linux kernel version: *minimum* 4.4.107, *4.14.29 or later recommended*
|
||||
|
||||
Don't bother trying to make Bees work with kernel versions older
|
||||
than 4.4.3. It may appear to work, but it won't end well: there are
|
||||
too many missing features and bugs to work around.
|
||||
Don't bother trying to make Bees work with kernel versions older than
|
||||
4.4.107. It may appear to work, but it won't end well: there are
|
||||
too many missing features and bugs (including data corruption bugs)
|
||||
to work around in older kernels.
|
||||
|
||||
Kernel versions between 4.4.3 and 4.11 are usable with bees, but bees
|
||||
can trigger known performance bugs and hangs in dedup-related functions.
|
||||
|
||||
When in doubt, use a newer kernel version. As of kernel 4.15.3 there
|
||||
is no released Linux kernel that has no relevant known bugs.
|
||||
Kernel versions between 4.4.107 and 4.14.29 are usable with bees,
|
||||
but bees can trigger known performance bugs and hangs in dedup-related
|
||||
functions.
|
||||
|
||||
* markdown
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user