mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
README: update the state of bees and the kernel for v4.14
Read-only snapshots have always just worked. Remove them from the "untested" list. nodatasum (and therefore nodatacow) inodes are simply ignored. This seems like the right thing to do since deduping a nodatacow extent turns it into a datacow extent, which seems contrary to administrator wishes implied by the nodatacow bit. We probably need an option to override that assumption. Clarify why converted ext[234] filesystems may cause problems and the nature of those problems. Assorted minor editorial changes. Discuss calculation of the balance limit parameter when ensuring sufficient metadata space. Update kernel version bug/fix/feature lists, including LOGICAL_INO_V2. Annotate kernel workaround list with known kernel versions that make the workarounds necessary. Remove reference to 'DEFRAG_RANGE' as bees requires much more control over data placement than this interface can offer. It's easy enough to create a new ioctl to implement bees requirements once it's known what those requirements are. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
305ab5dbfa
commit
dc7360397e
75
README.md
75
README.md
@ -134,11 +134,6 @@ performance by caching, but really fixing this requires rewriting the
|
|||||||
crawler to scan the btrfs extent tree directly instead of the subvol
|
crawler to scan the btrfs extent tree directly instead of the subvol
|
||||||
FS trees.
|
FS trees.
|
||||||
|
|
||||||
* Bees had support for multiple worker threads in the past; however,
|
|
||||||
this was removed because it made Bees too aggressive to coexist with
|
|
||||||
other applications on the same machine. It also hit the *slow backrefs*
|
|
||||||
on N CPU cores instead of just one.
|
|
||||||
|
|
||||||
* Block reads are currently more allocation- and CPU-intensive than they
|
* Block reads are currently more allocation- and CPU-intensive than they
|
||||||
should be, especially for filesystems on SSD where the IO overhead is
|
should be, especially for filesystems on SSD where the IO overhead is
|
||||||
much smaller. This is a problem for power-constrained environments
|
much smaller. This is a problem for power-constrained environments
|
||||||
@ -171,6 +166,7 @@ Bees has been tested in combination with the following:
|
|||||||
* Large (>16M) extents
|
* Large (>16M) extents
|
||||||
* Huge files (>1TB--although Btrfs performance on such files isn't great in general)
|
* Huge files (>1TB--although Btrfs performance on such files isn't great in general)
|
||||||
* filesystems up to 25T bytes, 100M+ files
|
* filesystems up to 25T bytes, 100M+ files
|
||||||
|
* btrfs read-only snapshots
|
||||||
|
|
||||||
Bad Btrfs Feature Interactions
|
Bad Btrfs Feature Interactions
|
||||||
------------------------------
|
------------------------------
|
||||||
@ -179,15 +175,13 @@ Bees has not been tested with the following, and undesirable interactions may oc
|
|||||||
|
|
||||||
* Non-4K filesystem data block size (should work if recompiled)
|
* Non-4K filesystem data block size (should work if recompiled)
|
||||||
* Non-equal hash (SUM) and filesystem data block (CLONE) sizes (probably never will work)
|
* Non-equal hash (SUM) and filesystem data block (CLONE) sizes (probably never will work)
|
||||||
* btrfs read-only snapshots (never tested, probably wouldn't work well)
|
* btrfs send/receive (receive is probably OK, but send could be confused?)
|
||||||
* btrfs send/receive (receive is probably OK, but send requires RO snapshots. See above)
|
|
||||||
* btrfs qgroups (never tested, no idea what might happen)
|
* btrfs qgroups (never tested, no idea what might happen)
|
||||||
* btrfs seed filesystems (does anyone even use those?)
|
* btrfs seed filesystems (does anyone even use those?)
|
||||||
* btrfs autodefrag mount option (never tested, could fight with Bees)
|
* btrfs autodefrag mount option (never tested, could fight with Bees)
|
||||||
* btrfs nodatacow inode attribute (needs datasum detection on extents, skipped for now)
|
* btrfs nodatacow/nodatasum inode attribute or mount option (bees skips all nodatasum files)
|
||||||
* btrfs nodatacow mount option (*could* work, but might not, skipped due to above note)
|
|
||||||
* btrfs out-of-tree kernel patches (e.g. in-band dedup or encryption)
|
* btrfs out-of-tree kernel patches (e.g. in-band dedup or encryption)
|
||||||
* btrfs-convert from ext2/3/4 (never tested)
|
* btrfs-convert from ext2/3/4 (never tested, might run out of space or ignore significant portions of the filesystem due to sanity checks)
|
||||||
* btrfs mixed block groups (don't know a reason why it would *not* work, but never tested)
|
* btrfs mixed block groups (don't know a reason why it would *not* work, but never tested)
|
||||||
* open(O_DIRECT)
|
* open(O_DIRECT)
|
||||||
* Filesystems mounted *without* the flushoncommit option
|
* Filesystems mounted *without* the flushoncommit option
|
||||||
@ -195,7 +189,7 @@ Bees has not been tested with the following, and undesirable interactions may oc
|
|||||||
Other Caveats
|
Other Caveats
|
||||||
-------------
|
-------------
|
||||||
|
|
||||||
* btrfs balance will invalidate parts of the dedup table. Bees will
|
* btrfs balance will invalidate parts of the dedup hash table. Bees will
|
||||||
happily rebuild the table, but it will have to scan all the blocks
|
happily rebuild the table, but it will have to scan all the blocks
|
||||||
again.
|
again.
|
||||||
|
|
||||||
@ -206,17 +200,35 @@ Other Caveats
|
|||||||
|
|
||||||
* Bees creates temporary files (with O_TMPFILE) and uses them to split
|
* Bees creates temporary files (with O_TMPFILE) and uses them to split
|
||||||
and combine extents elsewhere in btrfs. These will take up to 2GB
|
and combine extents elsewhere in btrfs. These will take up to 2GB
|
||||||
during normal operation.
|
of disk space per thread during normal operation.
|
||||||
|
|
||||||
* Like all deduplicators, Bees will replace data blocks with metadata
|
* Like all deduplicators, Bees will replace data blocks with metadata
|
||||||
references. It is a good idea to ensure there are several GB of
|
references. It is a good idea to ensure there is sufficient unallocated
|
||||||
unallocated space (see `btrfs fi df`) on the filesystem before running
|
space (see `btrfs fi usage`) on the filesystem to allow the metadata
|
||||||
Bees for the first time. Use
|
to multiply in size by the number of snapshots before running Bees
|
||||||
|
for the first time. Use
|
||||||
|
|
||||||
btrfs balance start -dusage=100,limit=1 /your/filesystem
|
btrfs balance start -dusage=100,limit=N /your/filesystem
|
||||||
|
|
||||||
If possible, raise the `limit` parameter to the current size of metadata
|
where the `limit` parameter 'N' should be calculated as follows:
|
||||||
usage (from `btrfs fi df`) plus 1.
|
|
||||||
|
* start with the current size of metadata usage (from `btrfs fi
|
||||||
|
df`) in GB, plus 1
|
||||||
|
|
||||||
|
* multiply by the proportion of disk space in subvols with
|
||||||
|
snapshots (i.e. if there are no snapshots, multiply by 0;
|
||||||
|
if all of the data is shared between at least one origin
|
||||||
|
and one snapshot subvol, multiply by 1)
|
||||||
|
|
||||||
|
* multiply by the number of snapshots (i.e. if there is only
|
||||||
|
one subvol, multiply by 0; if there are 3 snapshots and one
|
||||||
|
origin subvol, multiply by 3)
|
||||||
|
|
||||||
|
`limit = GB_metadata * (disk_space_in_snapshots / total_disk_space) * number_of_snapshots`
|
||||||
|
|
||||||
|
Monitor unallocated space to ensure that the filesystem never runs out
|
||||||
|
of metadata space (whether Bees is running or not--this is a general
|
||||||
|
btrfs requirement).
|
||||||
|
|
||||||
|
|
||||||
A Brief List Of Btrfs Kernel Bugs
|
A Brief List Of Btrfs Kernel Bugs
|
||||||
@ -229,18 +241,27 @@ Missing features (usually not available in older LTS kernels):
|
|||||||
* 3.16: `SEARCH_V2` ioctl added. Bees could use `SEARCH` instead.
|
* 3.16: `SEARCH_V2` ioctl added. Bees could use `SEARCH` instead.
|
||||||
* 4.2: `FILE_EXTENT_SAME` no longer updates mtime, can be used at EOF.
|
* 4.2: `FILE_EXTENT_SAME` no longer updates mtime, can be used at EOF.
|
||||||
|
|
||||||
|
Future features (kernel features Bees does not yet use, but may rely on
|
||||||
|
in the future):
|
||||||
|
|
||||||
|
* 4.14: `LOGICAL_INO_V2` allows userspace to create forward and backward
|
||||||
|
reference maps to entire physical extents with a single ioctl call,
|
||||||
|
and raises the limit of 2730 references per extent. Bees has not yet
|
||||||
|
been rewritten to take full advantage of these features.
|
||||||
|
|
||||||
Bug fixes (sometimes included in older LTS kernels):
|
Bug fixes (sometimes included in older LTS kernels):
|
||||||
|
|
||||||
|
* Bugs fixed prior to 4.4.3 are not listed here.
|
||||||
* 4.5: hang in the `INO_PATHS` ioctl used by Bees.
|
* 4.5: hang in the `INO_PATHS` ioctl used by Bees.
|
||||||
* 4.5: use-after-free in the `FILE_EXTENT_SAME` ioctl used by Bees.
|
* 4.5: use-after-free in the `FILE_EXTENT_SAME` ioctl used by Bees.
|
||||||
* 4.6: lost inodes after a rename, crash, and log tree replay
|
* 4.6: lost inodes after a rename, crash, and log tree replay
|
||||||
(triggered by the fsync() while writing `beescrawl.dat`).
|
(triggered by the fsync() while writing `beescrawl.dat`).
|
||||||
* 4.7: *slow backref* bug no longer triggers a softlockup panic. It still
|
* 4.7: *slow backref* bug no longer triggers a softlockup panic. It still
|
||||||
too long to resolve a block address to a root/inode/offset triple.
|
takes too long to resolve a block address to a root/inode/offset triple.
|
||||||
* 4.10: reduced CPU time cost of the LOGICAL_INO ioctl and dedup
|
* 4.10: reduced CPU time cost of the LOGICAL_INO ioctl and dedup
|
||||||
backref processing in general.
|
backref processing in general.
|
||||||
* 4.13 integration trees: 053582a7d423 btrfs: add cond_resched() calls
|
* 4.11: yet another dedup deadlock case is fixed.
|
||||||
when resolving backrefs
|
* 4.14: backref performance improvements make LOGICAL_INO even faster.
|
||||||
|
|
||||||
Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
||||||
|
|
||||||
@ -251,7 +272,7 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
|||||||
measuring the time the kernel spends performing certain operations
|
measuring the time the kernel spends performing certain operations
|
||||||
and permanently blacklisting any extent or hash where the kernel
|
and permanently blacklisting any extent or hash where the kernel
|
||||||
starts to get slow. Inside Bees, such blocks are marked as 'toxic'
|
starts to get slow. Inside Bees, such blocks are marked as 'toxic'
|
||||||
hash/block addresses.
|
hash/block addresses. *Needs to be retested after v4.14.*
|
||||||
|
|
||||||
* `LOGICAL_INO` output is arbitrarily limited to 2730 references
|
* `LOGICAL_INO` output is arbitrarily limited to 2730 references
|
||||||
even if more buffer space is provided for results. Once this number
|
even if more buffer space is provided for results. Once this number
|
||||||
@ -262,6 +283,7 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
|||||||
This places an obvious limit on dedup efficiency for extremely common
|
This places an obvious limit on dedup efficiency for extremely common
|
||||||
blocks or filesystems with many snapshots (although this limit is
|
blocks or filesystems with many snapshots (although this limit is
|
||||||
far greater than the effective limit imposed by the *slow backref* bug).
|
far greater than the effective limit imposed by the *slow backref* bug).
|
||||||
|
*Fixed in v4.14.*
|
||||||
|
|
||||||
* `LOGICAL_INO` on compressed extents returns a list of root/inode/offset
|
* `LOGICAL_INO` on compressed extents returns a list of root/inode/offset
|
||||||
tuples matching the extent bytenr of its argument. On uncompressed
|
tuples matching the extent bytenr of its argument. On uncompressed
|
||||||
@ -271,7 +293,7 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
|||||||
references requires calling `LOGICAL_INO` for every single block of
|
references requires calling `LOGICAL_INO` for every single block of
|
||||||
the extent. This is undesirable behavior for Bees, which wants a
|
the extent. This is undesirable behavior for Bees, which wants a
|
||||||
list of all extent refs referencing a data extent (i.e. Bees wants
|
list of all extent refs referencing a data extent (i.e. Bees wants
|
||||||
the compressed-extent behavior in all cases).
|
the compressed-extent behavior in all cases). *Fixed in v4.14.*
|
||||||
|
|
||||||
* `LOGICAL_INO` is only called from one thread at any time per process.
|
* `LOGICAL_INO` is only called from one thread at any time per process.
|
||||||
This means at most one core is irretrievably stuck in this ioctl.
|
This means at most one core is irretrievably stuck in this ioctl.
|
||||||
@ -281,13 +303,6 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
|
|||||||
or prealloc. Bees avoids feedback loops this can generate while
|
or prealloc. Bees avoids feedback loops this can generate while
|
||||||
attempting to replace extents over 16MB in length.
|
attempting to replace extents over 16MB in length.
|
||||||
|
|
||||||
* `DEFRAG_RANGE` is useless. The ioctl attempts to implement `btrfs
|
|
||||||
fi defrag` in the kernel, and will arbitrarily defragment more or
|
|
||||||
less than the range requested to match the behavior expected from the
|
|
||||||
userspace tool. Bees implements its own defrag instead, copying data
|
|
||||||
to a temporary file and using the `FILE_EXTENT_SAME` ioctl to replace
|
|
||||||
precisely the specified range of offending fragmented blocks.
|
|
||||||
|
|
||||||
* If the `fsync()` in `BeesTempFile::make_copy` is removed, the filesystem
|
* If the `fsync()` in `BeesTempFile::make_copy` is removed, the filesystem
|
||||||
hangs within a few hours, requiring a reboot to recover. On the other
|
hangs within a few hours, requiring a reboot to recover. On the other
|
||||||
hand, the `fsync()` only costs about 8% of overall performance.
|
hand, the `fsync()` only costs about 8% of overall performance.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user