mirror of
https://github.com/Zygo/bees.git
synced 2025-07-06 18:32:26 +02:00
scan_one_extent: eliminate nuisance dedupes, drop caches after reading data
A laundry list of problems fixed: * Track which physical blocks have been read recently without making any changes, and don't read them again. * Separate dedupe, split, and hole-punching operations into distinct planning and execution phases. * Keep the longest dedupe from overlapping dedupe matches, and flatten them into non-overlapping operations. * Don't scan extents that have blocks already in the hash table. We can't (yet) touch such an extent without making unreachable space. Let them go. * Give better information in the scan summary visualization: show dedupe range start and end points (<ddd>), matching blocks (=), copy blocks (+), zero blocks (0), inserted blocks (.), unresolved match blocks (M), should-have-been-inserted-but-for-some-reason-wasn't blocks (i), and there's-a-bug-we-didn't-do-this-one blocks (#). * Drop cached data from extents that have been inserted into the hash table without modification. * Rewrite the hole punching for uncompressed extents, which apparently hasn't worked properly since the beginning. Nuisance dedupe elimination: * Don't do more than 100 dedupe, copy, or hole-punch operations per extent ref. * Don't split an extent or punch a hole unless dedupe would save at least half of the extent ref's size. * Write a "skip:" summary showing the planned work when nuisance dedupe elimination decides to skip an extent. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
@ -384,7 +384,7 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
|
||||
return stop_now;
|
||||
}
|
||||
|
||||
BeesFileRange
|
||||
BeesRangePair
|
||||
BeesResolver::replace_dst(const BeesFileRange &dst_bfr_in)
|
||||
{
|
||||
BEESTRACE("replace_dst dst_bfr " << dst_bfr_in);
|
||||
@ -400,6 +400,7 @@ BeesResolver::replace_dst(const BeesFileRange &dst_bfr_in)
|
||||
BEESTRACE("overlap_bfr " << overlap_bfr);
|
||||
|
||||
BeesBlockData bbd(dst_bfr);
|
||||
BeesRangePair rv = { BeesFileRange(), BeesFileRange() };
|
||||
|
||||
for_each_extent_ref(bbd, [&](const BeesFileRange &src_bfr_in) -> bool {
|
||||
// Open src
|
||||
@ -436,21 +437,12 @@ BeesResolver::replace_dst(const BeesFileRange &dst_bfr_in)
|
||||
BEESCOUNT(replacedst_grown);
|
||||
}
|
||||
|
||||
// Dedup
|
||||
BEESNOTE("dedup " << brp);
|
||||
if (m_ctx->dedup(brp)) {
|
||||
BEESCOUNT(replacedst_dedup_hit);
|
||||
m_found_dup = true;
|
||||
overlap_bfr = brp.second;
|
||||
// FIXME: find best range first, then dedupe that
|
||||
return true; // i.e. break
|
||||
} else {
|
||||
BEESCOUNT(replacedst_dedup_miss);
|
||||
return false; // i.e. continue
|
||||
}
|
||||
rv = brp;
|
||||
m_found_dup = true;
|
||||
return true;
|
||||
});
|
||||
// BEESLOG("overlap_bfr after " << overlap_bfr);
|
||||
return overlap_bfr.copy_closed();
|
||||
return rv;
|
||||
}
|
||||
|
||||
BeesFileRange
|
||||
|
Reference in New Issue
Block a user