1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

context: add experimental code for avoiding tiny extents

In the current architecture we can't directly measure the physical extent
size, and we can't make good decisions with the extent data (reference)
item alone.  If the early return is enabled here, there is a small speedup
and a large drop in dedupe hit rate, especially when extent splits occur.

Leave the early return commented for now, but collect the event statistics.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell 2021-09-09 12:47:02 -04:00
parent a60c53a9e1
commit 11fabd66a8
2 changed files with 10 additions and 0 deletions

View File

@ -333,6 +333,7 @@ The `scan` event group consists of operations related to scanning incoming data.
* `scan_eof`: Scan past EOF was attempted.
* `scan_erase_redundant`: Blocks in the hash table were removed because they were removed from the filesystem by dedupe.
* `scan_extent`: An extent was scanned (`scan_one_extent`).
* `scan_extent_tiny`: An extent below 128K that was not the beginning or end of a file was scanned. No action is currently taken for these--they are merely counted.
* `scan_forward`: A logical byte range was scanned (`scan_forward`).
* `scan_found`: An entry was found in the hash table matching a scanned block from the filesystem.
* `scan_hash_hit`: A block was found on the filesystem corresponding to a block found in the hash table.

View File

@ -292,6 +292,15 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
BEESTRACE("scan extent " << e);
BEESCOUNT(scan_extent);
// EXPERIMENT: Don't bother with tiny extents unless they are the entire file.
// We'll take a tiny extent at BOF or EOF but not in between.
if (e.begin() && e.size() < 128 * 1024 && e.end() != Stat(bfr.fd()).st_size) {
BEESCOUNT(scan_extent_tiny);
// This doesn't work properly with the current architecture,
// so we don't do an early return here.
// return bfr;
}
// We keep moving this method around
auto m_ctx = shared_from_this();