mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
crawl: filter extents correctly
When an extent ref is modified, all of the refs in the same metadata page get the same transid in the TREE_SEARCH_V2 header. This causes two problems: - Extents with generation < min_transid are included if they happen to be referenced by pages with generation >= min_transid. - Extent refs with generation > max_transid are excluded even if they reference extents with generation <= max_transid. Both of these are wrong: the first causes some extents to be repeatedly scanned, the second causes some extents to not be scanned at all. Change the TREE_SEARCH_V2 parameters so that Crawl sees all extents newer than min_transid (i.e. set max_transid to max). The TREE_SEARCH_V2 kernel logic already operates this way, i.e. it fetches every page with transid >= min_transid and discards newer items if they are too new for max_transid. Filter strictly by the extent reference generation field (i.e. the copy of the extent generation that is in the extent reference). Note this still scans extent data multiple times, but it should now be exactly once per extent reference. A proper fix for this requires extent-based scanning instead of extent-ref-based scanning. Formerly commit 5a8c655fc447c08772f01107a87e3364f093bb46 "roots: filter out obsolete extents from extent refs" which landed in the subvol-threads branch but not master. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
408b6ae138
commit
087ec26c44
@ -864,7 +864,11 @@ BeesCrawl::fetch_extents()
|
||||
sk.min_type = sk.max_type = BTRFS_EXTENT_DATA_KEY;
|
||||
sk.min_offset = old_state.m_offset;
|
||||
sk.min_transid = old_state.m_min_transid;
|
||||
sk.max_transid = old_state.m_max_transid;
|
||||
// Don't set max_transid here. We want to see old extents with
|
||||
// new references, and max_transid filtering in the kernel locks
|
||||
// the filesystem while slowing us down.
|
||||
// sk.max_transid = old_state.m_max_transid;
|
||||
// sk.max_transid = numeric_limits<uint64_t>::max();
|
||||
sk.nr_items = BEES_MAX_CRAWL_SIZE;
|
||||
|
||||
// Lock in the old state
|
||||
@ -933,14 +937,26 @@ BeesCrawl::fetch_extents()
|
||||
if (gen < get_state().m_min_transid) {
|
||||
BEESCOUNT(crawl_gen_low);
|
||||
++count_low;
|
||||
// We probably want (need?) to scan these anyway.
|
||||
// continue;
|
||||
// We want (need?) to scan these anyway?
|
||||
// The header generation refers to the transid
|
||||
// of the metadata page holding the current ref.
|
||||
// This includes anything else in that page that
|
||||
// happened to be modified, regardless of how
|
||||
// old it is.
|
||||
// The file_extent_generation refers to the
|
||||
// transid of the extent item's page, which is
|
||||
// a different approximation of what we want.
|
||||
// Combine both of these filters to minimize
|
||||
// the number of times we unnecessarily re-read
|
||||
// an extent.
|
||||
continue;
|
||||
}
|
||||
if (gen > get_state().m_max_transid) {
|
||||
BEESCOUNT(crawl_gen_high);
|
||||
++count_high;
|
||||
// This shouldn't ever happen
|
||||
// continue;
|
||||
// We have to filter these here because we can't
|
||||
// do it in the kernel.
|
||||
continue;
|
||||
}
|
||||
|
||||
auto type = call_btrfs_get(btrfs_stack_file_extent_type, i.m_data);
|
||||
|
Loading…
x
Reference in New Issue
Block a user