bees: use helper function for readahead

There seem to be multiple ways to do readahead in Linux, and only some of them work. Hopefully reading the actual data is one of them. This is an attempt to avoid page-by-page reads in the generic dedupe code. We load both extents into the VFS cache (read sequentially) and hope they are still there by the time we call dedupe on them. We also call readahead(2) and hopefully that either helps or does nothing. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-12-24 20:40:21 +01:00 · 2021-05-28 01:58:16 -04:00
parent 0afd2850f4
commit 20b8f8ae0b
4 changed files with 34 additions and 5 deletions
--- a/src/bees-types.cc
+++ b/src/bees-types.cc
@@ -385,8 +385,8 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
 	BEESTRACE("e_second " << e_second);

 	// Preread entire extent
-	readahead(second.fd(), e_second.begin(), e_second.size());
-	readahead(first.fd(), e_second.begin() + first.begin() - second.begin(), e_second.size());
+	bees_readahead(second.fd(), e_second.begin(), e_second.size());
+	bees_readahead(first.fd(), e_second.begin() + first.begin() - second.begin(), e_second.size());

 	auto hash_table = ctx->hash_table();

@@ -405,7 +405,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
 				BEESCOUNT(pairbackward_hole);
 				break;
 			}
-			readahead(second.fd(), e_second.begin(), e_second.size());
+			bees_readahead(second.fd(), e_second.begin(), e_second.size());
 #else
 			// This tends to repeatedly process extents that were recently processed.
 			// We tend to catch duplicate blocks early since we scan them forwards.
@@ -514,7 +514,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
 				BEESCOUNT(pairforward_hole);
 				break;
 			}
-			readahead(second.fd(), e_second.begin(), e_second.size());
+			bees_readahead(second.fd(), e_second.begin(), e_second.size());
 		}
 		BEESCOUNT(pairforward_try);