1
0
mirror of https://github.com/Zygo/bees.git synced 2025-07-06 18:32:26 +02:00

scan_one_extent: eliminate nuisance dedupes, drop caches after reading data

A laundry list of problems fixed:

 * Track which physical blocks have been read recently without making
 any changes, and don't read them again.

 * Separate dedupe, split, and hole-punching operations into distinct
 planning and execution phases.

 * Keep the longest dedupe from overlapping dedupe matches, and flatten
 them into non-overlapping operations.

 * Don't scan extents that have blocks already in the hash table.
 We can't (yet) touch such an extent without making unreachable space.
 Let them go.

 * Give better information in the scan summary visualization:  show dedupe
 range start and end points (<ddd>), matching blocks (=), copy blocks
 (+), zero blocks (0), inserted blocks (.), unresolved match blocks
 (M), should-have-been-inserted-but-for-some-reason-wasn't blocks (i),
 and there's-a-bug-we-didn't-do-this-one blocks (#).

 * Drop cached data from extents that have been inserted into the hash
 table without modification.

 * Rewrite the hole punching for uncompressed extents, which apparently
 hasn't worked properly since the beginning.

Nuisance dedupe elimination:

 * Don't do more than 100 dedupe, copy, or hole-punch operations per
 extent ref.

 * Don't split an extent or punch a hole unless dedupe would save at
 least half of the extent ref's size.

 * Write a "skip:" summary showing the planned work when nuisance
 dedupe elimination decides to skip an extent.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell
2024-11-23 11:14:37 -05:00
parent 97eab9655c
commit 24b08ef7b7
3 changed files with 363 additions and 176 deletions

View File

@ -384,7 +384,7 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
return stop_now;
}
BeesFileRange
BeesRangePair
BeesResolver::replace_dst(const BeesFileRange &dst_bfr_in)
{
BEESTRACE("replace_dst dst_bfr " << dst_bfr_in);
@ -400,6 +400,7 @@ BeesResolver::replace_dst(const BeesFileRange &dst_bfr_in)
BEESTRACE("overlap_bfr " << overlap_bfr);
BeesBlockData bbd(dst_bfr);
BeesRangePair rv = { BeesFileRange(), BeesFileRange() };
for_each_extent_ref(bbd, [&](const BeesFileRange &src_bfr_in) -> bool {
// Open src
@ -436,21 +437,12 @@ BeesResolver::replace_dst(const BeesFileRange &dst_bfr_in)
BEESCOUNT(replacedst_grown);
}
// Dedup
BEESNOTE("dedup " << brp);
if (m_ctx->dedup(brp)) {
BEESCOUNT(replacedst_dedup_hit);
m_found_dup = true;
overlap_bfr = brp.second;
// FIXME: find best range first, then dedupe that
return true; // i.e. break
} else {
BEESCOUNT(replacedst_dedup_miss);
return false; // i.e. continue
}
rv = brp;
m_found_dup = true;
return true;
});
// BEESLOG("overlap_bfr after " << overlap_bfr);
return overlap_bfr.copy_closed();
return rv;
}
BeesFileRange