From dc7360397e81c9c17c0be5bb99f53031d95d369c Mon Sep 17 00:00:00 2001 From: Zygo Blaxell Date: Sun, 7 Jan 2018 03:41:45 -0500 Subject: [PATCH] README: update the state of bees and the kernel for v4.14 Read-only snapshots have always just worked. Remove them from the "untested" list. nodatasum (and therefore nodatacow) inodes are simply ignored. This seems like the right thing to do since deduping a nodatacow extent turns it into a datacow extent, which seems contrary to administrator wishes implied by the nodatacow bit. We probably need an option to override that assumption. Clarify why converted ext[234] filesystems may cause problems and the nature of those problems. Assorted minor editorial changes. Discuss calculation of the balance limit parameter when ensuring sufficient metadata space. Update kernel version bug/fix/feature lists, including LOGICAL_INO_V2. Annotate kernel workaround list with known kernel versions that make the workarounds necessary. Remove reference to 'DEFRAG_RANGE' as bees requires much more control over data placement than this interface can offer. It's easy enough to create a new ioctl to implement bees requirements once it's known what those requirements are. Signed-off-by: Zygo Blaxell --- README.md | 75 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 45 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index 3f9d635..6d483f7 100644 --- a/README.md +++ b/README.md @@ -134,11 +134,6 @@ performance by caching, but really fixing this requires rewriting the crawler to scan the btrfs extent tree directly instead of the subvol FS trees. -* Bees had support for multiple worker threads in the past; however, -this was removed because it made Bees too aggressive to coexist with -other applications on the same machine. It also hit the *slow backrefs* -on N CPU cores instead of just one. - * Block reads are currently more allocation- and CPU-intensive than they should be, especially for filesystems on SSD where the IO overhead is much smaller. This is a problem for power-constrained environments @@ -171,6 +166,7 @@ Bees has been tested in combination with the following: * Large (>16M) extents * Huge files (>1TB--although Btrfs performance on such files isn't great in general) * filesystems up to 25T bytes, 100M+ files +* btrfs read-only snapshots Bad Btrfs Feature Interactions ------------------------------ @@ -179,15 +175,13 @@ Bees has not been tested with the following, and undesirable interactions may oc * Non-4K filesystem data block size (should work if recompiled) * Non-equal hash (SUM) and filesystem data block (CLONE) sizes (probably never will work) -* btrfs read-only snapshots (never tested, probably wouldn't work well) -* btrfs send/receive (receive is probably OK, but send requires RO snapshots. See above) +* btrfs send/receive (receive is probably OK, but send could be confused?) * btrfs qgroups (never tested, no idea what might happen) * btrfs seed filesystems (does anyone even use those?) * btrfs autodefrag mount option (never tested, could fight with Bees) -* btrfs nodatacow inode attribute (needs datasum detection on extents, skipped for now) -* btrfs nodatacow mount option (*could* work, but might not, skipped due to above note) +* btrfs nodatacow/nodatasum inode attribute or mount option (bees skips all nodatasum files) * btrfs out-of-tree kernel patches (e.g. in-band dedup or encryption) -* btrfs-convert from ext2/3/4 (never tested) +* btrfs-convert from ext2/3/4 (never tested, might run out of space or ignore significant portions of the filesystem due to sanity checks) * btrfs mixed block groups (don't know a reason why it would *not* work, but never tested) * open(O_DIRECT) * Filesystems mounted *without* the flushoncommit option @@ -195,7 +189,7 @@ Bees has not been tested with the following, and undesirable interactions may oc Other Caveats ------------- -* btrfs balance will invalidate parts of the dedup table. Bees will +* btrfs balance will invalidate parts of the dedup hash table. Bees will happily rebuild the table, but it will have to scan all the blocks again. @@ -206,17 +200,35 @@ Other Caveats * Bees creates temporary files (with O_TMPFILE) and uses them to split and combine extents elsewhere in btrfs. These will take up to 2GB - during normal operation. + of disk space per thread during normal operation. * Like all deduplicators, Bees will replace data blocks with metadata - references. It is a good idea to ensure there are several GB of - unallocated space (see `btrfs fi df`) on the filesystem before running - Bees for the first time. Use + references. It is a good idea to ensure there is sufficient unallocated + space (see `btrfs fi usage`) on the filesystem to allow the metadata + to multiply in size by the number of snapshots before running Bees + for the first time. Use - btrfs balance start -dusage=100,limit=1 /your/filesystem + btrfs balance start -dusage=100,limit=N /your/filesystem - If possible, raise the `limit` parameter to the current size of metadata - usage (from `btrfs fi df`) plus 1. + where the `limit` parameter 'N' should be calculated as follows: + + * start with the current size of metadata usage (from `btrfs fi + df`) in GB, plus 1 + + * multiply by the proportion of disk space in subvols with + snapshots (i.e. if there are no snapshots, multiply by 0; + if all of the data is shared between at least one origin + and one snapshot subvol, multiply by 1) + + * multiply by the number of snapshots (i.e. if there is only + one subvol, multiply by 0; if there are 3 snapshots and one + origin subvol, multiply by 3) + + `limit = GB_metadata * (disk_space_in_snapshots / total_disk_space) * number_of_snapshots` + + Monitor unallocated space to ensure that the filesystem never runs out + of metadata space (whether Bees is running or not--this is a general + btrfs requirement). A Brief List Of Btrfs Kernel Bugs @@ -229,18 +241,27 @@ Missing features (usually not available in older LTS kernels): * 3.16: `SEARCH_V2` ioctl added. Bees could use `SEARCH` instead. * 4.2: `FILE_EXTENT_SAME` no longer updates mtime, can be used at EOF. +Future features (kernel features Bees does not yet use, but may rely on +in the future): + +* 4.14: `LOGICAL_INO_V2` allows userspace to create forward and backward + reference maps to entire physical extents with a single ioctl call, + and raises the limit of 2730 references per extent. Bees has not yet + been rewritten to take full advantage of these features. + Bug fixes (sometimes included in older LTS kernels): +* Bugs fixed prior to 4.4.3 are not listed here. * 4.5: hang in the `INO_PATHS` ioctl used by Bees. * 4.5: use-after-free in the `FILE_EXTENT_SAME` ioctl used by Bees. * 4.6: lost inodes after a rename, crash, and log tree replay (triggered by the fsync() while writing `beescrawl.dat`). * 4.7: *slow backref* bug no longer triggers a softlockup panic. It still - too long to resolve a block address to a root/inode/offset triple. + takes too long to resolve a block address to a root/inode/offset triple. * 4.10: reduced CPU time cost of the LOGICAL_INO ioctl and dedup backref processing in general. -* 4.13 integration trees: 053582a7d423 btrfs: add cond_resched() calls - when resolving backrefs +* 4.11: yet another dedup deadlock case is fixed. +* 4.14: backref performance improvements make LOGICAL_INO even faster. Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees: @@ -251,7 +272,7 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees: measuring the time the kernel spends performing certain operations and permanently blacklisting any extent or hash where the kernel starts to get slow. Inside Bees, such blocks are marked as 'toxic' - hash/block addresses. + hash/block addresses. *Needs to be retested after v4.14.* * `LOGICAL_INO` output is arbitrarily limited to 2730 references even if more buffer space is provided for results. Once this number @@ -262,6 +283,7 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees: This places an obvious limit on dedup efficiency for extremely common blocks or filesystems with many snapshots (although this limit is far greater than the effective limit imposed by the *slow backref* bug). + *Fixed in v4.14.* * `LOGICAL_INO` on compressed extents returns a list of root/inode/offset tuples matching the extent bytenr of its argument. On uncompressed @@ -271,7 +293,7 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees: references requires calling `LOGICAL_INO` for every single block of the extent. This is undesirable behavior for Bees, which wants a list of all extent refs referencing a data extent (i.e. Bees wants - the compressed-extent behavior in all cases). + the compressed-extent behavior in all cases). *Fixed in v4.14.* * `LOGICAL_INO` is only called from one thread at any time per process. This means at most one core is irretrievably stuck in this ioctl. @@ -281,13 +303,6 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees: or prealloc. Bees avoids feedback loops this can generate while attempting to replace extents over 16MB in length. -* `DEFRAG_RANGE` is useless. The ioctl attempts to implement `btrfs - fi defrag` in the kernel, and will arbitrarily defragment more or - less than the range requested to match the behavior expected from the - userspace tool. Bees implements its own defrag instead, copying data - to a temporary file and using the `FILE_EXTENT_SAME` ioctl to replace - precisely the specified range of offending fragmented blocks. - * If the `fsync()` in `BeesTempFile::make_copy` is removed, the filesystem hangs within a few hours, requiring a reboot to recover. On the other hand, the `fsync()` only costs about 8% of overall performance.