1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

crawl: filter extents correctly

When an extent ref is modified, all of the refs in the same metadata
page get the same transid in the TREE_SEARCH_V2 header.  This causes
two problems:

	- Extents with generation < min_transid are included if they
	happen to be referenced by pages with generation >= min_transid.

	- Extent refs with generation > max_transid are excluded even
	if they reference extents with generation <= max_transid.

Both of these are wrong:  the first causes some extents to be repeatedly
scanned, the second causes some extents to not be scanned at all.

Change the TREE_SEARCH_V2 parameters so that Crawl sees all extents
newer than min_transid (i.e. set max_transid to max).  The TREE_SEARCH_V2
kernel logic already operates this way, i.e. it fetches every page with
transid >= min_transid and discards newer items if they are too new for
max_transid.  Filter strictly by the extent reference generation field
(i.e. the copy of the extent generation that is in the extent reference).

Note this still scans extent data multiple times, but it should now
be exactly once per extent reference.  A proper fix for this requires
extent-based scanning instead of extent-ref-based scanning.

Formerly commit 5a8c655fc447c08772f01107a87e3364f093bb46 "roots: filter
out obsolete extents from extent refs" which landed in the subvol-threads
branch but not master.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell 2018-01-31 22:48:39 -05:00
parent 408b6ae138
commit 087ec26c44

View File

@ -864,7 +864,11 @@ BeesCrawl::fetch_extents()
sk.min_type = sk.max_type = BTRFS_EXTENT_DATA_KEY;
sk.min_offset = old_state.m_offset;
sk.min_transid = old_state.m_min_transid;
sk.max_transid = old_state.m_max_transid;
// Don't set max_transid here. We want to see old extents with
// new references, and max_transid filtering in the kernel locks
// the filesystem while slowing us down.
// sk.max_transid = old_state.m_max_transid;
// sk.max_transid = numeric_limits<uint64_t>::max();
sk.nr_items = BEES_MAX_CRAWL_SIZE;
// Lock in the old state
@ -933,14 +937,26 @@ BeesCrawl::fetch_extents()
if (gen < get_state().m_min_transid) {
BEESCOUNT(crawl_gen_low);
++count_low;
// We probably want (need?) to scan these anyway.
// continue;
// We want (need?) to scan these anyway?
// The header generation refers to the transid
// of the metadata page holding the current ref.
// This includes anything else in that page that
// happened to be modified, regardless of how
// old it is.
// The file_extent_generation refers to the
// transid of the extent item's page, which is
// a different approximation of what we want.
// Combine both of these filters to minimize
// the number of times we unnecessarily re-read
// an extent.
continue;
}
if (gen > get_state().m_max_transid) {
BEESCOUNT(crawl_gen_high);
++count_high;
// This shouldn't ever happen
// continue;
// We have to filter these here because we can't
// do it in the kernel.
continue;
}
auto type = call_btrfs_get(btrfs_stack_file_extent_type, i.m_data);