1
0
mirror of https://github.com/Zygo/bees.git synced 2025-07-05 10:02:27 +02:00

extent scan: refactor BeesCrawl, BeesScanMode*

The main gains here are:

* Move extent tree searches into BeesScanModeExtent so that they are
not slowed down by the BeesCrawl code, which was designed for the
much more specialized metadata in subvol trees.
* Enable short extent skipping now that BeesCrawl is out of the way.
* Stop enumerating btrfs subvols when in extent scan mode.

All this gets rid of >99% of unnecessary extent tree searches.
Incremental extent scan cycles now finish in milliseconds instead
of minutes.

BeesCrawl was never designed to cope with the structure and content of
the extent tree.  It would waste thousands of tree-search ioctl calls
reading and ignoring metadata items.

Performance was particularly bad when a binary search was involved, as any
binary search probe that landed in a metadata block group would read and
discard all the metadata items in the block group, sequentially, repeated
for each level of the binary search.  This was blocking implementation of
short extent skipping optimization for large extent size tiers, because
the skips were using thousands of tree searches to skip over only a few
hundred extent items.

Extent scan also had to read every extent item twice to do the
transid filtering, because BeesCrawl's interface discarded the relevant
information when it converted a `BtrfsTreeItem` into a `BeesFileRange`.
The cost of this extra fetch was negligible, but it could have been zero.

Fix this by:

* Copy the equivalent of `fetch_extents` from BeesCrawl into
`BeesScanModeExtent`, then give each of the extent scan crawlers its
own `BtrfsDataExtentTreeFetcher` instance.  This enables extent tree
searches to avoid pure (non-mixed) metadata block groups.  `BeesCrawl`
is now used only for its interface to `BeesRoots` for saving state in
`beescrawl.dat`, and never to determine the next extent tree item.

* Move subvol-specific parts of `BeesRoots` into a new class
`BeesScanModeSubvol` so that `BtrfsScanModeExtent` doesn't have to enable
or support them.  In particular, `bees -m4` no longer enumerates all
of the _subvol_ crawlers.  `BeesRoots` is still used to save and load
crawl state.

* Move several members from `BtrfsScanModeExtent` into a per-crawler
state object `SizeTier` to eliminate the need for some locks and to
maintain separate cache state for `BtrfsDataExtentTreeFetcher`.

* Reuse the `BtrfsTreeItem` to get the generation field for the transid
range filter.

* Avoid a few corner cases when handling errors, where extent scan might
drop an extent without scanning it, or fail to advance to the next extent.

* Enable the extent-skipping algorithm for large size tiers, now that
`BeesCrawl::fetch_extents` is no longer slowing it down.

* Add a debug stream interface which developers can easily turn on when
needed to inspect the decisions that extent scan is making.

* Track metrics that are more useful, particularly searches per extent
scanned, and fraction of extents that are skipped.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell
2025-01-27 20:11:06 -05:00
parent b6446d7316
commit 183b6a5361
2 changed files with 416 additions and 225 deletions

View File

@ -521,7 +521,7 @@ class BeesCrawl {
bool fetch_extents();
void fetch_extents_harder();
bool restart_crawl();
bool restart_crawl_unlocked();
BeesFileRange bti_to_bfr(const BtrfsTreeItem &bti) const;
public:
@ -535,6 +535,7 @@ public:
void deferred(bool def_setting);
bool deferred() const;
bool finished() const;
bool restart_crawl();
};
class BeesScanMode;
@ -543,7 +544,8 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
shared_ptr<BeesContext> m_ctx;
BeesStringFile m_crawl_state_file;
map<uint64_t, shared_ptr<BeesCrawl>> m_root_crawl_map;
using CrawlMap = map<uint64_t, shared_ptr<BeesCrawl>>;
CrawlMap m_root_crawl_map;
mutex m_mutex;
uint64_t m_crawl_dirty = 0;
uint64_t m_crawl_clean = 0;
@ -562,7 +564,7 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
condition_variable m_stop_condvar;
bool m_stop_requested = false;
void insert_new_crawl();
CrawlMap insert_new_crawl();
Fd open_root_nocache(uint64_t root);
Fd open_root_ino_nocache(uint64_t root, uint64_t ino);
uint64_t transid_max_nocache();
@ -579,12 +581,13 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
bool crawl_batch(shared_ptr<BeesCrawl> crawl);
void clear_caches();
friend class BeesScanModeExtent;
shared_ptr<BeesCrawl> insert_root(const BeesCrawlState &bcs);
friend class BeesCrawl;
friend class BeesFdCache;
friend class BeesScanMode;
friend class BeesScanModeSubvol;
friend class BeesScanModeExtent;
public:
BeesRoots(shared_ptr<BeesContext> ctx);