mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
Add some docs about the exceptions that are less easy to suppress directly. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
152 lines
7.0 KiB
Markdown
152 lines
7.0 KiB
Markdown
bees Gotchas
|
|
============
|
|
|
|
C++ Exceptions
|
|
--------------
|
|
|
|
bees is very paranoid about the data it gets from btrfs, and if btrfs
|
|
does anything bees does not expect, bees will throw an exception and move
|
|
on without touching the offending data. This will trigger a stack trace
|
|
to the log containing data which is useful for developers to understand
|
|
what happened.
|
|
|
|
In all cases C++ exceptions in bees are harmless to data on the
|
|
filesystem. bees handles most exceptions by aborting processing of
|
|
the current extent and moving to the next extent. In some cases an
|
|
exception may occur in a critical bees thread, which will stop the bees
|
|
process from making any further progress; however, these cases are rare
|
|
and are typically caused by unusual filesystem conditions (e.g. [freshly
|
|
formatted filesystem with no
|
|
data](https://github.com/Zygo/bees/issues/93)) or lack of memory or
|
|
other resources.
|
|
|
|
The following are common cases that users may encounter:
|
|
|
|
* If a snapshot is deleted, bees will generate a burst of exceptions for
|
|
references to files in the snapshot that no longer exist. This lasts
|
|
until the FD caches are cleared, usually a few minutes with default
|
|
btrfs mount options. These generally look like:
|
|
|
|
`std::system_error: BTRFS_IOC_TREE_SEARCH_V2: [path] at fs.cc:844: No such file or directory`
|
|
|
|
* If data is modified at the same time it is being scanned, bees will get
|
|
an inconsistent version of the data layout in the filesystem, causing
|
|
the `ExtentWalker` class to throw various constraint-check exceptions.
|
|
The exception causes bees to retry the extent in a later filesystem scan
|
|
(hopefully when the file is no longer being modified). The exception
|
|
text is similar to:
|
|
|
|
`std::runtime_error: fm.rbegin()->flags() = 776 failed constraint check (fm.rbegin()->flags() & FIEMAP_EXTENT_LAST) at extentwalker.cc:229`
|
|
|
|
but the line number or specific code fragment may vary.
|
|
|
|
* If there are too many possible matching blocks within a pair of extents,
|
|
bees will loop billions of times considering all possibilities. This is
|
|
a waste of time, so an exception is currently used to break out of such
|
|
loops early. The exception text in this case is:
|
|
|
|
`FIXME: bailing out here, need to fix this further up the call stack`
|
|
|
|
|
|
|
|
Snapshots
|
|
---------
|
|
|
|
bees can dedupe filesystems with many snapshots, but bees only does
|
|
well in this situation if bees was running on the filesystem from
|
|
the beginning.
|
|
|
|
Each time bees dedupes an extent that is referenced by a snapshot,
|
|
the entire metadata page in the snapshot subvol (16KB by default) must
|
|
be CoWed in btrfs. This can result in a substantial increase in btrfs
|
|
metadata size if there are many snapshots on a filesystem.
|
|
|
|
Normally, metadata is small (less than 1% of the filesystem) and dedupe
|
|
hit rates are large (10-40% of the filesystem), so the increase in
|
|
metadata size is offset by much larger reductions in data size and the
|
|
total space used by the entire filesystem is reduced.
|
|
|
|
If a subvol is deduped _before_ a snapshot is created, the snapshot will
|
|
have the same deduplication as the subvol. This does _not_ result in
|
|
unusually large metadata sizes. If a snapshot is made after bees has
|
|
fully scanned the origin subvol, bees can avoid scanning most of the
|
|
data in the snapshot subvol, as it will be provably identical to the
|
|
origin subvol that was already scanned.
|
|
|
|
If a subvol is deduped _after_ a snapshot is created, the origin and
|
|
snapshot subvols must be deduplicated separately. In the worst case, this
|
|
will double the amount of reading the bees scanner must perform, and will
|
|
also double the amount of btrfs metadata used for the snapshot; however,
|
|
the "worst case" is a dedupe hit rate of 1% or more, so a doubling of
|
|
metadata size is certain for all but the most unique data sets. Also,
|
|
bees will not be able to free any space until the last snapshot has been
|
|
scanned and deduped, so payoff in data space savings is deferred until
|
|
the metadata has almost finished expanding.
|
|
|
|
If a subvol is deduped after _many_ snapshots have been created, all
|
|
subvols must be deduplicated individually. In the worst case, this will
|
|
multiply the scanning work and metadata size by the number of snapshots.
|
|
For 100 snapshots this can mean a 100x growth in metadata size and
|
|
bees scanning time, which typically exceeds the possible savings from
|
|
reducing the data size by dedupe. In such cases using bees will result
|
|
in a net increase in disk space usage that persists until the snapshots
|
|
are deleted.
|
|
|
|
Snapshot case studies
|
|
---------------------
|
|
|
|
* bees running on an empty filesystem
|
|
* filesystem is mkfsed
|
|
* bees is installed and starts running
|
|
* data is written to the filesystem
|
|
* bees dedupes the data as it appears
|
|
* a snapshot is made of the data
|
|
* The snapshot will already be 99% deduped, so the metadata will
|
|
not expand very much because only 1% of the data in the snapshot
|
|
must be deduped.
|
|
* more snapshots are made of the data
|
|
* as long as dedupe has been completed on the origin subvol,
|
|
bees will quickly scan each new snapshot because it can skip
|
|
all the previously scanned data. Metadata usage remains low
|
|
(it may even shrink because there are fewer csums).
|
|
|
|
* bees installed on a non-empty filesystem with snapshots
|
|
* filesystem is mkfsed
|
|
* data is written to the filesystem
|
|
* multiple snapshots are made of the data
|
|
* bees is installed and starts running
|
|
* bees dedupes each snapshot individually
|
|
* The snapshot metadata will no longer be shared, resulting in
|
|
substantial growth of metadata usage.
|
|
* Disk space savings do not occur until bees processes the
|
|
last snapshot reference to data.
|
|
|
|
|
|
Other Gotchas
|
|
-------------
|
|
|
|
* bees avoids the [slow backrefs kernel bug](btrfs-kernel.md) by
|
|
measuring the time required to perform `LOGICAL_INO` operations.
|
|
If an extent requires over 0.1 kernel CPU seconds to perform a
|
|
`LOGICAL_INO` ioctl, then bees blacklists the extent and avoids
|
|
referencing it in future operations. In most cases, fewer than 0.1%
|
|
of extents in a filesystem must be avoided this way. This results
|
|
in short write latency spikes as btrfs will not allow writes to the
|
|
filesystem while `LOGICAL_INO` is running. Generally the CPU spends
|
|
most of the runtime of the `LOGICAL_INO` ioctl running the kernel,
|
|
so on a single-core CPU the entire system can freeze up for a second
|
|
during operations on toxic extents.
|
|
|
|
* If a process holds a directory FD open, the subvol containing the
|
|
directory cannot be deleted (`btrfs sub del` will start the deletion
|
|
process, but it will not proceed past the first open directory FD).
|
|
`btrfs-cleaner` will simply skip over the directory *and all of its
|
|
children* until the FD is closed. bees avoids this gotcha by closing
|
|
all of the FDs in its directory FD cache every 10 btrfs transactions.
|
|
|
|
* If a file is deleted while bees is caching an open FD to the file,
|
|
bees continues to scan the file. For very large files (e.g. VM
|
|
images), the deletion of the file can be delayed indefinitely.
|
|
To limit this delay, bees closes all FDs in its file FD cache every
|
|
10 btrfs transactions.
|