context: don't let multiple worker Tasks get stuck on a single extent or inode

When two Tasks attempt to lock the same extent, append the later Task to the earlier Task's post-exec work queue. This will guarantee that all Tasks which attempt to manipulate the same extent will execute sequentially, and free up threads to process other extents. Similarly, if two scanner threads operate on the same inode, any dedupe they perform will lock out other scanner threads in btrfs. Avoid this by serializing Task objects that reference the same file. This does theoretically use an unbounded amount of memory, but in practice a Task that encounters a contended extent or inode quickly stops spawning new Tasks that might increase the queue size, and all Tasks that might contend for the same lock(s) end up on a single FIFO queue. Note that the scope of inode locks is intentionally global, i.e. when an inode is locked, it locks every inode with the same number in every subvol. This avoids significant lock contention and task queue growth when the same inode with the same file extents appear in snapshots. Fixes: https://github.com/Zygo/bees/issues/158 Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-07 02:42:27 +02:00 · 2022-11-20 12:01:16 -05:00
parent 31d26bcfc6
commit 84f91af503
4 changed files with 50 additions and 15 deletions
--- a/src/bees-roots.cc
+++ b/src/bees-roots.cc
@ -243,6 +243,18 @@ BeesFileCrawl::crawl_one_extent()
 {
 	BEESNOTE("crawl_one_extent m_offset " << to_hex(m_offset) << " state " << m_state);
 	BEESTRACE("crawl_one_extent m_offset " << to_hex(m_offset) << " state " << m_state);
+
+	// Only one thread can dedupe a file.  btrfs will lock others out.
+	// Inodes are usually full of shared extents, especially in the case of snapshots,
+	// so when we lock an inode, we'll lock the same inode number in all subvols at once.
+	auto inode_mutex = m_ctx->get_inode_mutex(m_bedf.objectid());
+	auto inode_lock = inode_mutex->try_lock(Task::current_task());
+	if (!inode_lock) {
+		BEESCOUNT(scanf_deferred_inode);
+		// Returning false here means we won't reschedule ourselves, but inode_mutex will do that
+		return false;
+	}
+
 	// If we hit an exception here we don't try to catch it.
 	// It will mean the file or subvol was deleted or there's metadata corruption,
 	// and we should stop trying to scan the inode in that case.