mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
resolve: break up long intra-extent dedup loops
When both block candidates for dedup are located in the same extent, bees excludes them from deduplication because the dedup operation would not free any space (both blocks are still referenced, so neither is deleted). Candidates in other extents are still considered. Typically a few blocks are duplicated many thousands or even millions of times within a filesystem. Many of these blocks appear in the same extent as each other. In cases where an extent contains an extremely common duplicate block, it may appear multiple times in many extents. bees can get into a loop with a very bad worst-case running time: 32768 blocks per extent * 2560 bees reference limit * 256 distinct hash table entries = 21.5 *billion* iterations...squared, because this loop happens every time bees encounteres any of the references. Not an infinite number, but close enough. In each iteration of the loop, replace_dst detects that both src and dst block are part of the same btrfs extent data item and therefore should not be deduped; however, this occurs after the block has been allocated and read by chase_extent_ref. This dst is discarded, but the outer loop tries again with another reference to the same block and gets the same result. An easy fix for this problem is to stop the loop immediately when the same physical extent is found in both src and dst. The condition is rare enough to ignore the negligible space efficiency loss, and filesystem scan stops dead if the loop is allowed to proceed. An exception is thrown to terminate the loop at scan_one_extent from within replace_dst. It would be better to determine the extent bytenr of each candidate extent and filter them out in scan_one_extent (which reduces the number of LOGICAL_INO calls as a side-effect), but bees has no code capable of doing extent data tree lookups with backward iteration yet. Even better would be to change the hash table format so that the extent bytenr can be decoded directly from the hash table entry (this already exists for compressed extents). Both of these changes are too large for v0.6. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
2ac94438bd
commit
33d274eabd
@ -489,7 +489,8 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
|
||||
|
||||
BeesAddress last_replaced_addr;
|
||||
for (auto it = resolved_addrs.begin(); it != resolved_addrs.end(); ++it) {
|
||||
catch_all([&]() {
|
||||
// FIXME: Need to terminate this loop on replace_dst exception condition
|
||||
// catch_all([&]() {
|
||||
auto it_copy = *it;
|
||||
BEESNOTE("finding one match (out of " << it_copy.count() << ") at " << it_copy.addr() << " for " << bbd);
|
||||
BEESTRACE("finding one match (out of " << it_copy.count() << ") at " << it_copy.addr() << " for " << bbd);
|
||||
@ -541,7 +542,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
|
||||
} else {
|
||||
BEESCOUNT(scan_dup_miss);
|
||||
}
|
||||
});
|
||||
// });
|
||||
}
|
||||
if (last_replaced_addr) {
|
||||
// If we replaced extents containing the incoming addr,
|
||||
|
@ -355,7 +355,8 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
|
||||
}
|
||||
|
||||
// Look at the old data
|
||||
catch_all([&]() {
|
||||
// FIXME: propagate exceptions for now. Proper fix requires a rewrite.
|
||||
// catch_all([&]() {
|
||||
BEESTRACE("chase_extent_ref ino " << ino_off_root << " bbd " << bbd);
|
||||
auto new_range = chase_extent_ref(ino_off_root, bbd);
|
||||
// XXX: should we catch visitor's exceptions here?
|
||||
@ -370,7 +371,7 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
|
||||
// to a different extent between them.
|
||||
// stop_now = true;
|
||||
}
|
||||
});
|
||||
// });
|
||||
|
||||
if (stop_now) {
|
||||
break;
|
||||
@ -411,7 +412,8 @@ BeesResolver::replace_dst(const BeesFileRange &dst_bfr)
|
||||
BeesBlockData src_bbd(src_bfr.fd(), src_bfr.begin(), min(BLOCK_SIZE_SUMS, src_bfr.size()));
|
||||
if (bbd.addr().get_physical_or_zero() == src_bbd.addr().get_physical_or_zero()) {
|
||||
BEESCOUNT(replacedst_same);
|
||||
return false; // i.e. continue
|
||||
// stop looping here, all the other srcs will probably fail this test too
|
||||
throw runtime_error("FIXME: bailing out here, need to fix this further up the call stack");
|
||||
}
|
||||
|
||||
// Make pair(src, dst)
|
||||
|
Loading…
x
Reference in New Issue
Block a user