mirror of
https://github.com/Zygo/bees.git
synced 2025-07-05 10:02:27 +02:00
extent scan: refactor BeesCrawl, BeesScanMode*
The main gains here are: * Move extent tree searches into BeesScanModeExtent so that they are not slowed down by the BeesCrawl code, which was designed for the much more specialized metadata in subvol trees. * Enable short extent skipping now that BeesCrawl is out of the way. * Stop enumerating btrfs subvols when in extent scan mode. All this gets rid of >99% of unnecessary extent tree searches. Incremental extent scan cycles now finish in milliseconds instead of minutes. BeesCrawl was never designed to cope with the structure and content of the extent tree. It would waste thousands of tree-search ioctl calls reading and ignoring metadata items. Performance was particularly bad when a binary search was involved, as any binary search probe that landed in a metadata block group would read and discard all the metadata items in the block group, sequentially, repeated for each level of the binary search. This was blocking implementation of short extent skipping optimization for large extent size tiers, because the skips were using thousands of tree searches to skip over only a few hundred extent items. Extent scan also had to read every extent item twice to do the transid filtering, because BeesCrawl's interface discarded the relevant information when it converted a `BtrfsTreeItem` into a `BeesFileRange`. The cost of this extra fetch was negligible, but it could have been zero. Fix this by: * Copy the equivalent of `fetch_extents` from BeesCrawl into `BeesScanModeExtent`, then give each of the extent scan crawlers its own `BtrfsDataExtentTreeFetcher` instance. This enables extent tree searches to avoid pure (non-mixed) metadata block groups. `BeesCrawl` is now used only for its interface to `BeesRoots` for saving state in `beescrawl.dat`, and never to determine the next extent tree item. * Move subvol-specific parts of `BeesRoots` into a new class `BeesScanModeSubvol` so that `BtrfsScanModeExtent` doesn't have to enable or support them. In particular, `bees -m4` no longer enumerates all of the _subvol_ crawlers. `BeesRoots` is still used to save and load crawl state. * Move several members from `BtrfsScanModeExtent` into a per-crawler state object `SizeTier` to eliminate the need for some locks and to maintain separate cache state for `BtrfsDataExtentTreeFetcher`. * Reuse the `BtrfsTreeItem` to get the generation field for the transid range filter. * Avoid a few corner cases when handling errors, where extent scan might drop an extent without scanning it, or fail to advance to the next extent. * Enable the extent-skipping algorithm for large size tiers, now that `BeesCrawl::fetch_extents` is no longer slowing it down. * Add a debug stream interface which developers can easily turn on when needed to inspect the decisions that extent scan is making. * Track metrics that are more useful, particularly searches per extent scanned, and fraction of extents that are skipped. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
11
src/bees.h
11
src/bees.h
@ -521,7 +521,7 @@ class BeesCrawl {
|
||||
|
||||
bool fetch_extents();
|
||||
void fetch_extents_harder();
|
||||
bool restart_crawl();
|
||||
bool restart_crawl_unlocked();
|
||||
BeesFileRange bti_to_bfr(const BtrfsTreeItem &bti) const;
|
||||
|
||||
public:
|
||||
@ -535,6 +535,7 @@ public:
|
||||
void deferred(bool def_setting);
|
||||
bool deferred() const;
|
||||
bool finished() const;
|
||||
bool restart_crawl();
|
||||
};
|
||||
|
||||
class BeesScanMode;
|
||||
@ -543,7 +544,8 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
|
||||
shared_ptr<BeesContext> m_ctx;
|
||||
|
||||
BeesStringFile m_crawl_state_file;
|
||||
map<uint64_t, shared_ptr<BeesCrawl>> m_root_crawl_map;
|
||||
using CrawlMap = map<uint64_t, shared_ptr<BeesCrawl>>;
|
||||
CrawlMap m_root_crawl_map;
|
||||
mutex m_mutex;
|
||||
uint64_t m_crawl_dirty = 0;
|
||||
uint64_t m_crawl_clean = 0;
|
||||
@ -562,7 +564,7 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
|
||||
condition_variable m_stop_condvar;
|
||||
bool m_stop_requested = false;
|
||||
|
||||
void insert_new_crawl();
|
||||
CrawlMap insert_new_crawl();
|
||||
Fd open_root_nocache(uint64_t root);
|
||||
Fd open_root_ino_nocache(uint64_t root, uint64_t ino);
|
||||
uint64_t transid_max_nocache();
|
||||
@ -579,12 +581,13 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
|
||||
bool crawl_batch(shared_ptr<BeesCrawl> crawl);
|
||||
void clear_caches();
|
||||
|
||||
friend class BeesScanModeExtent;
|
||||
shared_ptr<BeesCrawl> insert_root(const BeesCrawlState &bcs);
|
||||
|
||||
friend class BeesCrawl;
|
||||
friend class BeesFdCache;
|
||||
friend class BeesScanMode;
|
||||
friend class BeesScanModeSubvol;
|
||||
friend class BeesScanModeExtent;
|
||||
|
||||
public:
|
||||
BeesRoots(shared_ptr<BeesContext> ctx);
|
||||
|
Reference in New Issue
Block a user