tempfile: make sure FS_COMPR_FL stays set

btrfs will set the FS_NOCOMP_FL flag when all of the following are true: 1. The filesystem is not mounted with the `compress-force` option 2. Heuristic analysis of the data suggests the data is compressible 3. Compression fails to produce a result that is smaller than the original If the compression ratio is 40%, and the original data is 128K long, then compressed data will be about 52K long (rounded up to 4K), so item 3 is usually false; however, if the original data is 8K long, then the compressed data will be 8K long too, and btrfs will set FS_NOCOMP_FL. To work around that, keep setting FS_COMPR_FL and clearing FS_NOCOMP_FL every time a TempFile is reset. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
tempfile: clear FS_NOCOW_FL while setting FS_COMPR_FL
2025-12-01 01:03:39 +01:00 · 2025-06-29 23:25:36 -04:00 · 2025-06-29 23:24:55 -04:00 · 2025-06-18 23:06:14 -04:00 · 2025-06-18 22:33:05 -04:00 · 2025-06-18 22:33:05 -04:00
23 changed files with 1144 additions and 622 deletions
--- a/docs/btrfs-kernel.md
+++ b/docs/btrfs-kernel.md
@@ -55,6 +55,7 @@ These bugs are particularly popular among bees users, though not all are specifi
 | 5.4 | 5.11 | spurious tree checker failures on extent ref hash | 5.4.125, 5.10.43, 5.11.5, 5.12 and later | 1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match
 | - | 5.11 | tree mod log issue #5 | 4.4.263, 4.9.263, 4.14.227, 4.19.183, 5.4.108, 5.10.26, 5.11.9, 5.12 and later | dbcc7d57bffc btrfs: fix race when cloning extent buffer during rewind of an old root
 | - | 5.12 | tree mod log issue #6 | 4.14.233, 4.19.191, 5.4.118, 5.10.36, 5.11.20, 5.12.3, 5.13 and later | f9690f426b21 btrfs: fix race when picking most recent mod log operation for an old root
+| 5.11 | 5.12 | subvols marked for deletion with `btrfs sub del` become permanently undeletable ("ghost" subvols) | 5.12 stopped creation of new ghost subvols | Partially fixed in 8d488a8c7ba2 btrfs: fix subvolume/snapshot deletion not triggered on mount.  Qu wrote a [patch](https://github.com/adam900710/linux/commit/9de990fcc8864c376eb28aa7482c54321f94acd4) to allow `btrfs sub del -i` to remove "ghost" subvols, but it was never merged upstream.
 | 4.15 | 5.16 | spurious warnings from `fs/fs-writeback.c` when `flushoncommit` is enabled | 5.15.27, 5.16.13, 5.17 and later | a0f0cf8341e3 btrfs: get rid of warning on transaction commit when using flushoncommit
 | - | 5.17 | crash during device removal can make filesystem unmountable | 5.15.54, 5.16.20, 5.17.3, 5.18 and later | bbac58698a55 btrfs: remove device item and update super block in the same transaction
 | - | 5.18 | wrong superblock num_devices makes filesystem unmountable | 4.14.283, 4.19.247, 5.4.198, 5.10.121, 5.15.46, 5.17.14, 5.18.3, 5.19 and later | d201238ccd2f btrfs: repair super block num_devices automatically
--- a/docs/event-counters.md
+++ b/docs/event-counters.md
@@ -120,13 +120,14 @@ The `crawl` event group consists of operations related to scanning btrfs trees t

 * `crawl_again`: An inode crawl was restarted because the extent was already locked by another running crawl.
 * `crawl_blacklisted`: An extent was not scanned because it belongs to a blacklisted file.
- * `crawl_create`: A new subvol or extent crawler was created.
 * `crawl_deferred_inode`: Two tasks attempted to scan the same inode at the same time, so one was deferred.
 * `crawl_done`: One pass over a subvol was completed.
- * `crawl_discard`: An extent that didn't match the crawler's size tier was discarded.
+ * `crawl_discard_high`: An extent that was too large for the crawler's size tier was discarded.
+ * `crawl_discard_low`: An extent that was too small for the crawler's size tier was discarded.
 * `crawl_empty`: A `TREE_SEARCH_V2` ioctl call failed or returned an empty set (usually because all data in the subvol was scanned).
 * `crawl_extent`: The extent crawler queued all references to an extent for processing.
 * `crawl_fail`: A `TREE_SEARCH_V2` ioctl call failed.
+ * `crawl_flop`: Small extent items were not skipped because the next extent started at or before the end of the previous extent.
 * `crawl_gen_high`: An extent item in the search results refers to an extent that is newer than the current crawl's `max_transid` allows.
 * `crawl_gen_low`: An extent item in the search results refers to an extent that is older than the current crawl's `min_transid` allows.
 * `crawl_hole`: An extent item in the search results refers to a hole.
@@ -138,6 +139,8 @@ The `crawl` event group consists of operations related to scanning btrfs trees t
 * `crawl_prealloc`: An extent item in the search results refers to a `PREALLOC` extent.
 * `crawl_push`: An extent item in the search results is suitable for scanning and deduplication.
 * `crawl_scan`: An extent item in the search results is submitted to `BeesContext::scan_forward` for scanning and deduplication.
+ * `crawl_skip`: Small extent items were skipped because no extent of sufficient size was found within the minimum search distance.
+ * `crawl_skip_ms`: Time spent skipping small extent items.
 * `crawl_search`: A `TREE_SEARCH_V2` ioctl call was successful.
 * `crawl_throttled`: Extent scan created too many work queue items and was prevented from creating any more.
 * `crawl_tree_block`: Extent scan found and skipped a metadata tree block.
@@ -281,11 +284,14 @@ The `progress` event group consists of events related to progress estimation.
 readahead
 ---------

-The `readahead` event group consists of events related to calls to `posix_fadvise`.
+The `readahead` event group consists of events related to data prefetching (formerly calls to `posix_fadvise` or `readahead`, but now emulated in userspace).

+ * `readahead_bytes`: Number of bytes prefetched.
+ * `readahead_count`: Number of read calls.
 * `readahead_clear`: Number of times the duplicate read cache was cleared.
- * `readahead_skip`: Number of times a duplicate read was identified in the cache and skipped.
+ * `readahead_fail`: Number of read errors during prefetch.
 * `readahead_ms`: Total time spent emulating readahead in user-space (kernel readahead is not measured).
+ * `readahead_skip`: Number of times a duplicate read was identified in the cache and skipped.
 * `readahead_unread_ms`: Total time spent running `posix_fadvise(..., POSIX_FADV_DONTNEED)`.

 replacedst
--- a/include/crucible/btrfs-tree.h
+++ b/include/crucible/btrfs-tree.h
@@ -173,34 +173,42 @@ namespace crucible {
 		void get_sums(uint64_t logical, size_t count, function<void(uint64_t logical, const uint8_t *buf, size_t count)> output);
 	};

-	/// Fetch extent items from extent tree
+	/// Fetch extent items from extent tree.
+	/// Does not filter out metadata!  See BtrfsDataExtentTreeFetcher for that.
 	class BtrfsExtentItemFetcher : public BtrfsTreeObjectFetcher {
 	public:
 		BtrfsExtentItemFetcher(const Fd &fd);
 	};

-	/// Fetch extent refs from an inode
+	/// Fetch extent refs from an inode.  Caller must set the tree and objectid.
 	class BtrfsExtentDataFetcher : public BtrfsTreeOffsetFetcher {
 	public:
 		BtrfsExtentDataFetcher(const Fd &fd);
 	};

-	/// Fetch inodes from a subvol
-	class BtrfsFsTreeFetcher : public BtrfsTreeObjectFetcher {
-	public:
-		BtrfsFsTreeFetcher(const Fd &fd, uint64_t subvol);
-	};
-
+	/// Fetch raw inode items
 	class BtrfsInodeFetcher : public BtrfsTreeObjectFetcher {
 	public:
 		BtrfsInodeFetcher(const Fd &fd);
 		BtrfsTreeItem stat(uint64_t subvol, uint64_t inode);
 	};

+	/// Fetch a root (subvol) item
 	class BtrfsRootFetcher : public BtrfsTreeObjectFetcher {
 	public:
 		BtrfsRootFetcher(const Fd &fd);
 		BtrfsTreeItem root(uint64_t subvol);
+		BtrfsTreeItem root_backref(uint64_t subvol);
+	};
+
+	/// Fetch data extent items from extent tree, skipping metadata-only block groups
+	class BtrfsDataExtentTreeFetcher : public BtrfsExtentItemFetcher {
+		BtrfsTreeItem		m_current_bg;
+		BtrfsTreeOffsetFetcher	m_chunk_tree;
+	protected:
+		virtual void next_sk(BtrfsIoctlSearchKey &key, const BtrfsIoctlSearchHeader &hdr) override;
+	public:
+		BtrfsDataExtentTreeFetcher(const Fd &fd);
 	};

 }
--- a/include/crucible/btrfs.h
+++ b/include/crucible/btrfs.h
@@ -78,9 +78,6 @@ enum btrfs_compression_type {
 	#define BTRFS_SHARED_BLOCK_REF_KEY      182
 	#define BTRFS_SHARED_DATA_REF_KEY       184
 	#define BTRFS_BLOCK_GROUP_ITEM_KEY 192
-	#define BTRFS_FREE_SPACE_INFO_KEY 198
-	#define BTRFS_FREE_SPACE_EXTENT_KEY 199
-	#define BTRFS_FREE_SPACE_BITMAP_KEY 200
 	#define BTRFS_DEV_EXTENT_KEY    204
 	#define BTRFS_DEV_ITEM_KEY      216
 	#define BTRFS_CHUNK_ITEM_KEY    228
@@ -97,6 +94,18 @@ enum btrfs_compression_type {

 #endif

+#ifndef BTRFS_FREE_SPACE_INFO_KEY
+	#define BTRFS_FREE_SPACE_INFO_KEY 198
+	#define BTRFS_FREE_SPACE_EXTENT_KEY 199
+	#define BTRFS_FREE_SPACE_BITMAP_KEY 200
+	#define BTRFS_FREE_SPACE_OBJECTID -11ULL
+#endif
+
+#ifndef BTRFS_BLOCK_GROUP_RAID1C4
+	#define BTRFS_BLOCK_GROUP_RAID1C3       (1ULL << 9)
+	#define BTRFS_BLOCK_GROUP_RAID1C4       (1ULL << 10)
+#endif
+
 #ifndef BTRFS_DEFRAG_RANGE_START_IO

 	// For some reason uapi has BTRFS_DEFRAG_RANGE_COMPRESS and
--- a/include/crucible/fs.h
+++ b/include/crucible/fs.h
@@ -201,11 +201,13 @@ namespace crucible {
 		static thread_local size_t s_calls;
 		static thread_local size_t s_loops;
 		static thread_local size_t s_loops_empty;
+		static thread_local shared_ptr<ostream> s_debug_ostream;
 	};

 	ostream & operator<<(ostream &os, const btrfs_ioctl_search_key &key);
 	ostream & operator<<(ostream &os, const BtrfsIoctlSearchKey &key);

+	string btrfs_chunk_type_ntoa(uint64_t type);
 	string btrfs_search_type_ntoa(unsigned type);
 	string btrfs_search_objectid_ntoa(uint64_t objectid);
 	string btrfs_compress_type_ntoa(uint8_t type);
@@ -246,9 +248,11 @@ namespace crucible {
 	struct BtrfsIoctlFsInfoArgs : public btrfs_ioctl_fs_info_args_v3 {
 		BtrfsIoctlFsInfoArgs();
 		void do_ioctl(int fd);
+		bool do_ioctl_nothrow(int fd);
 		uint16_t csum_type() const;
 		uint16_t csum_size() const;
 		uint64_t generation() const;
+		vector<uint8_t> fsid() const;
 	};

 	ostream & operator<<(ostream &os, const BtrfsIoctlFsInfoArgs &a);
--- a/include/crucible/hexdump.h
+++ b/include/crucible/hexdump.h
@@ -13,7 +13,7 @@ namespace crucible {
 	hexdump(ostream &os, const V &v)
 	{
 		const auto v_size = v.size();
-		const uint8_t* const v_data = reinterpret_cast<uint8_t*>(v.data());
+		const uint8_t* const v_data = reinterpret_cast<const uint8_t*>(v.data());
 		os << "V { size = " << v_size << ", data:\n";
 		for (size_t i = 0; i < v_size; i += 8) {
 			string hex, ascii;
--- a/include/crucible/openat2.h
+++ b/include/crucible/openat2.h
@@ -1,11 +1,46 @@
 #ifndef CRUCIBLE_OPENAT2_H
 #define CRUCIBLE_OPENAT2_H

+#include <cstdlib>
+
+// Compatibility for building on old libc for new kernel
+#include <linux/version.h>
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(5, 6, 0)
+
 #include <linux/openat2.h>

-#include <fcntl.h>
-#include <sys/syscall.h>
-#include <unistd.h>
+#else
+
+#include <linux/types.h>
+
+#ifndef RESOLVE_NO_XDEV
+#define RESOLVE_NO_XDEV 1
+
+// RESOLVE_NO_XDEV was there from the beginning of openat2,
+// so if that's missing, so is open_how
+
+struct open_how {
+	__u64 flags;
+	__u64 mode;
+	__u64 resolve;
+};
+#endif
+
+#ifndef RESOLVE_NO_MAGICLINKS
+#define RESOLVE_NO_MAGICLINKS 2
+#endif
+#ifndef RESOLVE_NO_SYMLINKS
+#define RESOLVE_NO_SYMLINKS 4
+#endif
+#ifndef RESOLVE_BENEATH
+#define RESOLVE_BENEATH 8
+#endif
+#ifndef RESOLVE_IN_ROOT
+#define RESOLVE_IN_ROOT 16
+#endif
+
+#endif // Linux version >= v5.6

 extern "C" {

--- a/include/crucible/seeker.h
+++ b/include/crucible/seeker.h
@@ -6,23 +6,23 @@
 #include <algorithm>
 #include <limits>

-#include <cstdint>
-
-#if 1
+// Debug stream
+#include <memory>
 #include <iostream>
 #include <sstream>
-#define DINIT(__x) __x
-#define DLOG(__x) do { logs << __x << std::endl; } while (false)
-#define DOUT(__err) do { __err << logs.str(); } while (false)
-#else
-#define DINIT(__x) do {} while (false)
-#define DLOG(__x) do {} while (false)
-#define DOUT(__x) do {} while (false)
-#endif
+
+#include <cstdint>

 namespace crucible {
 	using namespace std;

+	extern thread_local shared_ptr<ostream> tl_seeker_debug_str;
+	#define SEEKER_DEBUG_LOG(__x) do { \
+		if (tl_seeker_debug_str) { \
+			(*tl_seeker_debug_str) << __x << "\n"; \
+		} \
+	} while (false)
+
 	// Requirements for Container<Pos> Fetch(Pos lower, Pos upper):
 	// - fetches objects in Pos order, starting from lower (must be >= lower)
 	// - must return upper if present, may or may not return objects after that
@@ -49,113 +49,108 @@ namespace crucible {
 	Pos
 	seek_backward(Pos const target_pos, Fetch fetch, Pos min_step = 1, size_t max_loops = numeric_limits<size_t>::max())
 	{
-		DINIT(ostringstream logs);
-		try {
-			static const Pos end_pos = numeric_limits<Pos>::max();
-			// TBH this probably won't work if begin_pos != 0, i.e. any signed type
-			static const Pos begin_pos = numeric_limits<Pos>::min();
-			// Run a binary search looking for the highest key below target_pos.
-			// Initial upper bound of the search is target_pos.
-			// Find initial lower bound by doubling the size of the range until a key below target_pos
-			// is found, or the lower bound reaches the beginning of the search space.
-			// If the lower bound search reaches the beginning of the search space without finding a key,
-			// return the beginning of the search space; otherwise, perform a binary search between
-			// the bounds now established.
-			Pos lower_bound = 0;
-			Pos upper_bound = target_pos;
-			bool found_low = false;
-			Pos probe_pos = target_pos;
-			// We need one loop for each bit of the search space to find the lower bound,
-			// one loop for each bit of the search space to find the upper bound,
-			// and one extra loop to confirm the boundary is correct.
-			for (size_t loop_count = min(numeric_limits<Pos>::digits * size_t(2) + 1, max_loops); loop_count; --loop_count) {
-				DLOG("fetch(probe_pos = " << probe_pos << ", target_pos = " << target_pos << ")");
-				auto result = fetch(probe_pos, target_pos);
-				const Pos low_pos = result.empty() ? end_pos : *result.begin();
-				const Pos high_pos = result.empty() ? end_pos : *result.rbegin();
-				DLOG(" = " << low_pos << ".." << high_pos);
-				// check for correct behavior of the fetch function
-				THROW_CHECK2(out_of_range, high_pos, probe_pos, probe_pos <= high_pos);
-				THROW_CHECK2(out_of_range, low_pos, probe_pos, probe_pos <= low_pos);
-				THROW_CHECK2(out_of_range, low_pos, high_pos, low_pos <= high_pos);
-				if (!found_low) {
-					// if target_pos == end_pos then we will find it in every empty result set,
-					// so in that case we force the lower bound to be lower than end_pos
-					if ((target_pos == end_pos) ? (low_pos < target_pos) : (low_pos <= target_pos)) {
-						// found a lower bound, set the low bound there and switch to binary search
-						found_low = true;
-						lower_bound = low_pos;
-						DLOG("found_low = true, lower_bound = " << lower_bound);
-					} else {
-						// still looking for lower bound
-						// if probe_pos was begin_pos then we can stop with no result
-						if (probe_pos == begin_pos) {
-							DLOG("return: probe_pos == begin_pos " << begin_pos);
-							return begin_pos;
-						}
-						// double the range size, or use the distance between objects found so far
-						THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
-						// already checked low_pos <= high_pos above
-						const Pos want_delta = max(upper_bound - probe_pos, min_step);
-						// avoid underflowing the beginning of the search space
-						const Pos have_delta = min(want_delta, probe_pos - begin_pos);
-						THROW_CHECK2(out_of_range, want_delta, have_delta, have_delta <= want_delta);
-						// move probe and try again
-						probe_pos = probe_pos - have_delta;
-						DLOG("probe_pos " << probe_pos << " = probe_pos - have_delta " << have_delta << " (want_delta " << want_delta << ")");
-						continue;
+		static const Pos end_pos = numeric_limits<Pos>::max();
+		// TBH this probably won't work if begin_pos != 0, i.e. any signed type
+		static const Pos begin_pos = numeric_limits<Pos>::min();
+		// Run a binary search looking for the highest key below target_pos.
+		// Initial upper bound of the search is target_pos.
+		// Find initial lower bound by doubling the size of the range until a key below target_pos
+		// is found, or the lower bound reaches the beginning of the search space.
+		// If the lower bound search reaches the beginning of the search space without finding a key,
+		// return the beginning of the search space; otherwise, perform a binary search between
+		// the bounds now established.
+		Pos lower_bound = 0;
+		Pos upper_bound = target_pos;
+		bool found_low = false;
+		Pos probe_pos = target_pos;
+		// We need one loop for each bit of the search space to find the lower bound,
+		// one loop for each bit of the search space to find the upper bound,
+		// and one extra loop to confirm the boundary is correct.
+		for (size_t loop_count = min((1 + numeric_limits<Pos>::digits) * size_t(2), max_loops); loop_count; --loop_count) {
+			SEEKER_DEBUG_LOG("fetch(probe_pos = " << probe_pos << ", target_pos = " << target_pos << ")");
+			auto result = fetch(probe_pos, target_pos);
+			const Pos low_pos = result.empty() ? end_pos : *result.begin();
+			const Pos high_pos = result.empty() ? end_pos : *result.rbegin();
+			SEEKER_DEBUG_LOG(" = " << low_pos << ".." << high_pos);
+			// check for correct behavior of the fetch function
+			THROW_CHECK2(out_of_range, high_pos, probe_pos, probe_pos <= high_pos);
+			THROW_CHECK2(out_of_range, low_pos, probe_pos, probe_pos <= low_pos);
+			THROW_CHECK2(out_of_range, low_pos, high_pos, low_pos <= high_pos);
+			if (!found_low) {
+				// if target_pos == end_pos then we will find it in every empty result set,
+				// so in that case we force the lower bound to be lower than end_pos
+				if ((target_pos == end_pos) ? (low_pos < target_pos) : (low_pos <= target_pos)) {
+					// found a lower bound, set the low bound there and switch to binary search
+					found_low = true;
+					lower_bound = low_pos;
+					SEEKER_DEBUG_LOG("found_low = true, lower_bound = " << lower_bound);
+				} else {
+					// still looking for lower bound
+					// if probe_pos was begin_pos then we can stop with no result
+					if (probe_pos == begin_pos) {
+						SEEKER_DEBUG_LOG("return: probe_pos == begin_pos " << begin_pos);
+						return begin_pos;
 					}
+					// double the range size, or use the distance between objects found so far
+					THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
+					// already checked low_pos <= high_pos above
+					const Pos want_delta = max(upper_bound - probe_pos, min_step);
+					// avoid underflowing the beginning of the search space
+					const Pos have_delta = min(want_delta, probe_pos - begin_pos);
+					THROW_CHECK2(out_of_range, want_delta, have_delta, have_delta <= want_delta);
+					// move probe and try again
+					probe_pos = probe_pos - have_delta;
+					SEEKER_DEBUG_LOG("probe_pos " << probe_pos << " = probe_pos - have_delta " << have_delta << " (want_delta " << want_delta << ")");
+					continue;
 				}
-				if (low_pos <= target_pos && target_pos <= high_pos) {
-					// have keys on either side of target_pos in result
-					// search from the high end until we find the highest key below target
-					for (auto i = result.rbegin(); i != result.rend(); ++i) {
-						// more correctness checking for fetch
-						THROW_CHECK2(out_of_range, *i, probe_pos, probe_pos <= *i);
-						if (*i <= target_pos) {
-							DLOG("return: *i " << *i << " <= target_pos " << target_pos);
-							return *i;
-						}
-					}
-					// if the list is empty then low_pos = high_pos = end_pos
-					// if target_pos = end_pos also, then we will execute the loop
-					// above but not find any matching entries.
-					THROW_CHECK0(runtime_error, result.empty());
-				}
-				if (target_pos <= low_pos) {
-					// results are all too high, so probe_pos..low_pos is too high
-					// lower the high bound to the probe pos
-					upper_bound = probe_pos;
-					DLOG("upper_bound = probe_pos " << probe_pos);
-				}
-				if (high_pos < target_pos) {
-					// results are all too low, so probe_pos..high_pos is too low
-					// raise the low bound to the high_pos
-					DLOG("lower_bound = high_pos " << high_pos);
-					lower_bound = high_pos;
-				}
-				// compute a new probe pos at the middle of the range and try again
-				// we can't have a zero-size range here because we would not have set found_low yet
-				THROW_CHECK2(out_of_range, lower_bound, upper_bound, lower_bound <= upper_bound);
-				const Pos delta = (upper_bound - lower_bound) / 2;
-				probe_pos = lower_bound + delta;
-				if (delta < 1) {
-					// nothing can exist in the range (lower_bound, upper_bound)
-					// and an object is known to exist at lower_bound
-					DLOG("return: probe_pos == lower_bound " << lower_bound);
-					return lower_bound;
-				}
-				THROW_CHECK2(out_of_range, lower_bound, probe_pos, lower_bound <= probe_pos);
-				THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
-				DLOG("loop: lower_bound " << lower_bound << ", probe_pos " << probe_pos << ", upper_bound " << upper_bound);
 			}
-			THROW_ERROR(runtime_error, "FIXME: should not reach this line: "
-				"lower_bound..upper_bound " << lower_bound << ".." << upper_bound << ", "
-				"found_low " << found_low);
-		} catch (...) {
-			DOUT(cerr);
-			throw;
+			if (low_pos <= target_pos && target_pos <= high_pos) {
+				// have keys on either side of target_pos in result
+				// search from the high end until we find the highest key below target
+				for (auto i = result.rbegin(); i != result.rend(); ++i) {
+					// more correctness checking for fetch
+					THROW_CHECK2(out_of_range, *i, probe_pos, probe_pos <= *i);
+					if (*i <= target_pos) {
+						SEEKER_DEBUG_LOG("return: *i " << *i << " <= target_pos " << target_pos);
+						return *i;
+					}
+				}
+				// if the list is empty then low_pos = high_pos = end_pos
+				// if target_pos = end_pos also, then we will execute the loop
+				// above but not find any matching entries.
+				THROW_CHECK0(runtime_error, result.empty());
+			}
+			if (target_pos <= low_pos) {
+				// results are all too high, so probe_pos..low_pos is too high
+				// lower the high bound to the probe pos, low_pos cannot be lower
+				SEEKER_DEBUG_LOG("upper_bound = probe_pos " << probe_pos);
+				upper_bound = probe_pos;
+			}
+			if (high_pos < target_pos) {
+				// results are all too low, so probe_pos..high_pos is too low
+				// raise the low bound to high_pos but not above upper_bound
+				const auto next_pos = min(high_pos, upper_bound);
+				SEEKER_DEBUG_LOG("lower_bound = next_pos " << next_pos);
+				lower_bound = next_pos;
+			}
+			// compute a new probe pos at the middle of the range and try again
+			// we can't have a zero-size range here because we would not have set found_low yet
+			THROW_CHECK2(out_of_range, lower_bound, upper_bound, lower_bound <= upper_bound);
+			const Pos delta = (upper_bound - lower_bound) / 2;
+			probe_pos = lower_bound + delta;
+			if (delta < 1) {
+				// nothing can exist in the range (lower_bound, upper_bound)
+				// and an object is known to exist at lower_bound
+				SEEKER_DEBUG_LOG("return: probe_pos == lower_bound " << lower_bound);
+				return lower_bound;
+			}
+			THROW_CHECK2(out_of_range, lower_bound, probe_pos, lower_bound <= probe_pos);
+			THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
+			SEEKER_DEBUG_LOG("loop bottom: lower_bound " << lower_bound << ", probe_pos " << probe_pos << ", upper_bound " << upper_bound);
 		}
+		THROW_ERROR(runtime_error, "FIXME: should not reach this line: "
+			"lower_bound..upper_bound " << lower_bound << ".." << upper_bound << ", "
+			"found_low " << found_low);
 	}
 }

--- a/lib/Makefile
+++ b/lib/Makefile
@@ -17,6 +17,7 @@ CRUCIBLE_OBJS = \
 	openat2.o \
 	path.o \
 	process.o \
+	seeker.o \
 	string.o \
 	table.o \
 	task.o \
--- a/lib/btrfs-tree.cc
+++ b/lib/btrfs-tree.cc
@@ -5,6 +5,12 @@
 #include "crucible/hexdump.h"
 #include "crucible/seeker.h"

+#define CRUCIBLE_BTRFS_TREE_DEBUG(x) do { \
+	if (BtrfsIoctlSearchKey::s_debug_ostream) { \
+		(*BtrfsIoctlSearchKey::s_debug_ostream) << x; \
+	} \
+} while (false)
+
 namespace crucible {
 	using namespace std;

@@ -355,6 +361,7 @@ namespace crucible {
 	BtrfsTreeItem
 	BtrfsTreeFetcher::at(uint64_t logical)
 	{
+		CRUCIBLE_BTRFS_TREE_DEBUG("at " << logical);
 		BtrfsIoctlSearchKey &sk = m_sk;
 		fill_sk(sk, logical);
 		// Exact match, should return 0 or 1 items
@@ -397,53 +404,59 @@ namespace crucible {
 	BtrfsTreeFetcher::rlower_bound(uint64_t logical)
 	{
 	#if 0
-	#define BTFRLB_DEBUG(x) do { cerr << x; } while (false)
+		static bool btfrlb_debug = getenv("BTFLRB_DEBUG");
+	#define BTFRLB_DEBUG(x) do { if (btfrlb_debug) cerr << x; } while (false)
 	#else
-	#define BTFRLB_DEBUG(x) do { } while (false)
+	#define BTFRLB_DEBUG(x) CRUCIBLE_BTRFS_TREE_DEBUG(x)
 	#endif
 		BtrfsTreeItem closest_item;
 		uint64_t closest_logical = 0;
 		BtrfsIoctlSearchKey &sk = m_sk;
 		size_t loops = 0;
-		BTFRLB_DEBUG("rlower_bound: " << to_hex(logical) << endl);
-		seek_backward(scale_logical(logical), [&](uint64_t lower_bound, uint64_t upper_bound) {
+		BTFRLB_DEBUG("rlower_bound: " << to_hex(logical) << " in tree " << tree() << endl);
+		seek_backward(scale_logical(logical), [&](uint64_t const lower_bound, uint64_t const upper_bound) {
 			++loops;
 			fill_sk(sk, unscale_logical(min(scaled_max_logical(), lower_bound)));
 			set<uint64_t> rv;
+			bool too_far = false;
 			do {
 				sk.nr_items = 4;
 				sk.do_ioctl(fd());
 				BTFRLB_DEBUG("fetch: loop " << loops << " lower_bound..upper_bound " << to_hex(lower_bound) << ".." << to_hex(upper_bound));
 				for (auto &i : sk.m_result) {
 					next_sk(sk, i);
-					const auto this_logical = hdr_logical(i);
-					const auto scaled_hdr_logical = scale_logical(this_logical);
-					BTFRLB_DEBUG(" " << to_hex(scaled_hdr_logical));
-					if (hdr_match(i)) {
-						if (this_logical <= logical && this_logical > closest_logical) {
-							closest_logical = this_logical;
-							closest_item = i;
-						}
-						BTFRLB_DEBUG("(match)");
-						rv.insert(scaled_hdr_logical);
-					}
-					if (scaled_hdr_logical > upper_bound || hdr_stop(i)) {
-						if (scaled_hdr_logical >= upper_bound) {
-							BTFRLB_DEBUG("(" << to_hex(scaled_hdr_logical) << " >= " << to_hex(upper_bound) << ")");
-						}
-						if (hdr_stop(i)) {
-							rv.insert(numeric_limits<uint64_t>::max());
-							BTFRLB_DEBUG("(stop)");
-						}
+					// If hdr_stop or !hdr_match, don't inspect the item
+					if (hdr_stop(i)) {
+						too_far = true;
+						rv.insert(numeric_limits<uint64_t>::max());
+						BTFRLB_DEBUG("(stop)");
 						break;
-					} else {
-						BTFRLB_DEBUG("(cont'd)");
 					}
+					if (!hdr_match(i)) {
+						BTFRLB_DEBUG("(no match)");
+						continue;
+					}
+					const auto this_logical = hdr_logical(i);
+					BTFRLB_DEBUG(" " << to_hex(this_logical) << " " << i);
+					const auto scaled_hdr_logical = scale_logical(this_logical);
+					BTFRLB_DEBUG(" " << "(match)");
+					if (scaled_hdr_logical > upper_bound) {
+						too_far = true;
+						BTFRLB_DEBUG("(" << to_hex(scaled_hdr_logical) << " >= " << to_hex(upper_bound) << ")");
+						break;
+					}
+					if (this_logical <= logical && this_logical > closest_logical) {
+						closest_logical = this_logical;
+						closest_item = i;
+						BTFRLB_DEBUG("(closest)");
+					}
+					rv.insert(scaled_hdr_logical);
+					BTFRLB_DEBUG("(cont'd)");
 				}
 				BTFRLB_DEBUG(endl);
 				// We might get a search result that contains only non-matching items.
 				// Keep looping until we find any matching item or we run out of tree.
-			} while (rv.empty() && !sk.m_result.empty());
+			} while (!too_far && rv.empty() && !sk.m_result.empty());
 			return rv;
 		}, scale_logical(lookbehind_size()));
 		return closest_item;
@@ -474,6 +487,7 @@ namespace crucible {
 	BtrfsTreeItem
 	BtrfsTreeFetcher::next(uint64_t logical)
 	{
+		CRUCIBLE_BTRFS_TREE_DEBUG("next " << logical);
 		const auto scaled_logical = scale_logical(logical);
 		if (scaled_logical + 1 > scaled_max_logical()) {
 			return BtrfsTreeItem();
@@ -484,6 +498,7 @@ namespace crucible {
 	BtrfsTreeItem
 	BtrfsTreeFetcher::prev(uint64_t logical)
 	{
+		CRUCIBLE_BTRFS_TREE_DEBUG("prev " << logical);
 		const auto scaled_logical = scale_logical(logical);
 		if (scaled_logical < 1) {
 			return BtrfsTreeItem();
@@ -568,9 +583,10 @@ namespace crucible {
 	BtrfsCsumTreeFetcher::get_sums(uint64_t const logical, size_t count, function<void(uint64_t logical, const uint8_t *buf, size_t bytes)> output)
 	{
 	#if 0
-	#define BCTFGS_DEBUG(x) do { cerr << x; } while (false)
+		static bool bctfgs_debug = getenv("BCTFGS_DEBUG");
+	#define BCTFGS_DEBUG(x) do { if (bctfgs_debug) cerr << x; } while (false)
 	#else
-	#define BCTFGS_DEBUG(x) do { } while (false)
+	#define BCTFGS_DEBUG(x) CRUCIBLE_BTRFS_TREE_DEBUG(x)
 	#endif
 		const uint64_t logical_end = logical + count * block_size();
 		BtrfsTreeItem bti = rlower_bound(logical);
@@ -662,14 +678,6 @@ namespace crucible {
 		type(BTRFS_EXTENT_DATA_KEY);
 	}

-	BtrfsFsTreeFetcher::BtrfsFsTreeFetcher(const Fd &new_fd, uint64_t subvol) :
-		BtrfsTreeObjectFetcher(new_fd)
-	{
-		tree(subvol);
-		type(BTRFS_EXTENT_DATA_KEY);
-		scale_size(1);
-	}
-
 	BtrfsInodeFetcher::BtrfsInodeFetcher(const Fd &fd) :
 		BtrfsTreeObjectFetcher(fd)
 	{
@@ -693,18 +701,86 @@ namespace crucible {
 		BtrfsTreeObjectFetcher(fd)
 	{
 		tree(BTRFS_ROOT_TREE_OBJECTID);
-		type(BTRFS_ROOT_ITEM_KEY);
 		scale_size(1);
 	}

 	BtrfsTreeItem
-	BtrfsRootFetcher::root(uint64_t subvol)
+	BtrfsRootFetcher::root(const uint64_t subvol)
 	{
+		const auto my_type = BTRFS_ROOT_ITEM_KEY;
+		type(my_type);
 		const auto item = at(subvol);
 		if (!!item) {
 			THROW_CHECK2(runtime_error, item.objectid(), subvol, subvol == item.objectid());
-			THROW_CHECK2(runtime_error, item.type(), BTRFS_ROOT_ITEM_KEY, item.type() == BTRFS_ROOT_ITEM_KEY);
+			THROW_CHECK2(runtime_error, item.type(), my_type, item.type() == my_type);
 		}
 		return item;
 	}
+
+	BtrfsTreeItem
+	BtrfsRootFetcher::root_backref(const uint64_t subvol)
+	{
+		const auto my_type = BTRFS_ROOT_BACKREF_KEY;
+		type(my_type);
+		const auto item = at(subvol);
+		if (!!item) {
+			THROW_CHECK2(runtime_error, item.objectid(), subvol, subvol == item.objectid());
+			THROW_CHECK2(runtime_error, item.type(), my_type, item.type() == my_type);
+		}
+		return item;
+	}
+
+	BtrfsDataExtentTreeFetcher::BtrfsDataExtentTreeFetcher(const Fd &fd) :
+		BtrfsExtentItemFetcher(fd),
+		m_chunk_tree(fd)
+	{
+		tree(BTRFS_EXTENT_TREE_OBJECTID);
+		type(BTRFS_EXTENT_ITEM_KEY);
+		m_chunk_tree.tree(BTRFS_CHUNK_TREE_OBJECTID);
+		m_chunk_tree.type(BTRFS_CHUNK_ITEM_KEY);
+		m_chunk_tree.objectid(BTRFS_FIRST_CHUNK_TREE_OBJECTID);
+	}
+
+	void
+	BtrfsDataExtentTreeFetcher::next_sk(BtrfsIoctlSearchKey &key, const BtrfsIoctlSearchHeader &hdr)
+	{
+		key.min_type = key.max_type = type();
+		key.max_objectid = key.max_offset = numeric_limits<uint64_t>::max();
+		key.min_offset = 0;
+		key.min_objectid = hdr.objectid;
+		const auto step = scale_size();
+		if (key.min_objectid < numeric_limits<uint64_t>::max() - step) {
+			key.min_objectid += step;
+		} else {
+			key.min_objectid = numeric_limits<uint64_t>::max();
+		}
+		// If we're still in our current block group, check here
+		if (!!m_current_bg) {
+			const auto bg_begin = m_current_bg.offset();
+			const auto bg_end = bg_begin + m_current_bg.chunk_length();
+			// If we are still in our current block group, return early
+			if (key.min_objectid >= bg_begin && key.min_objectid < bg_end) return;
+		}
+		// We don't have a current block group or we're out of range
+		// Find the chunk that this bytenr belongs to
+		m_current_bg = m_chunk_tree.rlower_bound(key.min_objectid);
+		// Make sure it's a data block group
+		while (!!m_current_bg) {
+			// Data block group, stop here
+			if (m_current_bg.chunk_type() & BTRFS_BLOCK_GROUP_DATA) break;
+			// Not a data block group, skip to end
+			key.min_objectid = m_current_bg.offset() + m_current_bg.chunk_length();
+			m_current_bg = m_chunk_tree.lower_bound(key.min_objectid);
+		}
+		if (!m_current_bg) {
+			// Ran out of data block groups, stop here
+			return;
+		}
+		// Check to see if bytenr is in the current data block group
+		const auto bg_begin = m_current_bg.offset();
+		if (key.min_objectid < bg_begin) {
+			// Move forward to start of data block group
+			key.min_objectid = bg_begin;
+		}
+	}
 }
--- a/lib/fs.cc
+++ b/lib/fs.cc
@@ -757,6 +757,7 @@ namespace crucible {
 	thread_local size_t BtrfsIoctlSearchKey::s_calls = 0;
 	thread_local size_t BtrfsIoctlSearchKey::s_loops = 0;
 	thread_local size_t BtrfsIoctlSearchKey::s_loops_empty = 0;
+	thread_local shared_ptr<ostream> BtrfsIoctlSearchKey::s_debug_ostream;

 	bool
 	BtrfsIoctlSearchKey::do_ioctl_nothrow(int fd)
@@ -776,6 +777,9 @@ namespace crucible {
 			ioctl_ptr = ioctl_arg.get<btrfs_ioctl_search_args_v2>();
 			ioctl_ptr->key = static_cast<const btrfs_ioctl_search_key&>(*this);
 			ioctl_ptr->buf_size = buf_size;
+			if (s_debug_ostream) {
+				(*s_debug_ostream) << "bisk " << (ioctl_ptr->key) << "\n";
+			}
 			// Don't bother supporting V1.  Kernels that old have other problems.
 			int rv = ioctl(fd, BTRFS_IOC_TREE_SEARCH_V2, ioctl_arg.data());
 			++s_calls;
@@ -881,6 +885,26 @@ namespace crucible {
 		}
 	}

+	string
+	btrfs_chunk_type_ntoa(uint64_t type)
+	{
+		static const bits_ntoa_table table[] = {
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_DATA),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_METADATA),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_SYSTEM),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_DUP),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_RAID0),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_RAID1),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_RAID10),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_RAID1C3),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_RAID1C4),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_RAID5),
+			NTOA_TABLE_ENTRY_BITS(BTRFS_BLOCK_GROUP_RAID6),
+			NTOA_TABLE_ENTRY_END()
+		};
+		return bits_ntoa(type, table);
+	}
+
 	string
 	btrfs_search_type_ntoa(unsigned type)
 	{
@@ -908,15 +932,9 @@ namespace crucible {
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_SHARED_BLOCK_REF_KEY),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_SHARED_DATA_REF_KEY),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_BLOCK_GROUP_ITEM_KEY),
-#ifdef BTRFS_FREE_SPACE_INFO_KEY
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_FREE_SPACE_INFO_KEY),
-#endif
-#ifdef BTRFS_FREE_SPACE_EXTENT_KEY
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_FREE_SPACE_EXTENT_KEY),
-#endif
-#ifdef BTRFS_FREE_SPACE_BITMAP_KEY
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_FREE_SPACE_BITMAP_KEY),
-#endif
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_DEV_EXTENT_KEY),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_DEV_ITEM_KEY),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_CHUNK_ITEM_KEY),
@@ -948,9 +966,7 @@ namespace crucible {
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_CSUM_TREE_OBJECTID),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_QUOTA_TREE_OBJECTID),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_UUID_TREE_OBJECTID),
-#ifdef BTRFS_FREE_SPACE_TREE_OBJECTID
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_FREE_SPACE_TREE_OBJECTID),
-#endif
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_BALANCE_OBJECTID),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_ORPHAN_OBJECTID),
 			NTOA_TABLE_ENTRY_ENUM(BTRFS_TREE_LOG_OBJECTID),
@@ -1138,11 +1154,17 @@ namespace crucible {
 	{
 	}

-	void
-	BtrfsIoctlFsInfoArgs::do_ioctl(int fd)
+	bool
+	BtrfsIoctlFsInfoArgs::do_ioctl_nothrow(int const fd)
 	{
 		btrfs_ioctl_fs_info_args_v3 *p = static_cast<btrfs_ioctl_fs_info_args_v3 *>(this);
-		if (ioctl(fd, BTRFS_IOC_FS_INFO, p)) {
+		return 0 == ioctl(fd, BTRFS_IOC_FS_INFO, p);
+	}
+
+	void
+	BtrfsIoctlFsInfoArgs::do_ioctl(int const fd)
+	{
+		if (!do_ioctl_nothrow(fd)) {
 			THROW_ERRNO("BTRFS_IOC_FS_INFO: fd " << fd);
 		}
 	}
@@ -1159,6 +1181,13 @@ namespace crucible {
 		return this->btrfs_ioctl_fs_info_args_v3::csum_size;
 	}

+	vector<uint8_t>
+	BtrfsIoctlFsInfoArgs::fsid() const
+	{
+		const auto begin = btrfs_ioctl_fs_info_args_v3::fsid;
+		return vector<uint8_t>(begin, begin + BTRFS_FSID_SIZE);
+	}
+
 	uint64_t
 	BtrfsIoctlFsInfoArgs::generation() const
 	{
--- a/lib/openat2.cc
+++ b/lib/openat2.cc
@@ -1,5 +1,27 @@
 #include "crucible/openat2.h"

+#include <sys/syscall.h>
+
+// Compatibility for building on old libc for new kernel
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(5, 6, 0)
+
+// Every arch that defines this uses 437, except Alpha, where 437 is
+// mq_getsetattr.
+
+#ifndef SYS_openat2
+#ifdef __alpha__
+#define SYS_openat2 547
+#else
+#define SYS_openat2 437
+#endif
+#endif
+
+#endif // Linux version >= v5.6
+
+#include <fcntl.h>
+#include <unistd.h>
+
 extern "C" {

 int
@@ -7,7 +29,12 @@ __attribute__((weak))
 openat2(int const dirfd, const char *const pathname, struct open_how *const how, size_t const size)
 throw()
 {
+#ifdef SYS_openat2
 	return syscall(SYS_openat2, dirfd, pathname, how, size);
+#else
+	errno = ENOSYS;
+	return -1;
+#endif
 }

 };
--- a/lib/seeker.cc
+++ b/lib/seeker.cc
@@ -0,0 +1,7 @@
+#include "crucible/seeker.h"
+
+namespace crucible {
+
+	thread_local shared_ptr<ostream> tl_seeker_debug_str;
+
+};
--- a/scripts/beesd.in
+++ b/scripts/beesd.in
@@ -1,5 +1,13 @@
 #!/bin/bash

+# if not called from systemd try to replicate mount unsharing on ctrl+c
+# see: https://github.com/Zygo/bees/issues/281
+if [ -z "${SYSTEMD_EXEC_PID}" -a -z "${UNSHARE_DONE}" ]; then
+        UNSHARE_DONE=true
+        export UNSHARE_DONE
+        exec unshare -m --propagation private -- "$0" "$@"
+fi
+
 ## Helpful functions
 INFO(){ echo "INFO:" "$@"; }
 ERRO(){ echo "ERROR:" "$@"; exit 1; }
@@ -108,13 +116,11 @@ mkdir -p "$WORK_DIR" || exit 1
 INFO "MOUNT DIR: $MNT_DIR"
 mkdir -p "$MNT_DIR" || exit 1

-mount --make-private -osubvolid=5 /dev/disk/by-uuid/$UUID "$MNT_DIR" || exit 1
+mount --make-private -osubvolid=5,nodev,noexec /dev/disk/by-uuid/$UUID "$MNT_DIR" || exit 1

 if [ ! -d "$BEESHOME" ]; then
    INFO "Create subvol $BEESHOME for store bees data"
    btrfs sub cre "$BEESHOME"
-else
-    btrfs sub show "$BEESHOME" &> /dev/null || ERRO "$BEESHOME MUST BE A SUBVOL!"
 fi

 # Check DB size
--- a/scripts/beesd@.service.in
+++ b/scripts/beesd@.service.in
@@ -17,6 +17,7 @@ KillSignal=SIGTERM
 MemoryAccounting=true
 Nice=19
 Restart=on-abnormal
+RuntimeDirectoryMode=0700
 RuntimeDirectory=bees
 StartupCPUWeight=25
 StartupIOWeight=25
--- a/src/bees-context.cc
+++ b/src/bees-context.cc
@@ -230,8 +230,10 @@ BeesContext::dedup(const BeesRangePair &brp_in)
 	BeesAddress first_addr(brp.first.fd(), brp.first.begin());
 	BeesAddress second_addr(brp.second.fd(), brp.second.begin());

-	if (first_addr.get_physical_or_zero() == second_addr.get_physical_or_zero()) {
-		BEESLOGTRACE("equal physical addresses in dedup");
+	const auto first_gpoz = first_addr.get_physical_or_zero();
+	const auto second_gpoz = second_addr.get_physical_or_zero();
+	if (first_gpoz == second_gpoz) {
+		BEESLOGDEBUG("equal physical addresses " << first_addr << " and " << second_addr << " in dedup");
 		BEESCOUNT(bug_dedup_same_physical);
 	}

@@ -259,7 +261,7 @@ BeesContext::dedup(const BeesRangePair &brp_in)
 				BEESCOUNTADD(dedup_bytes, brp.first.size());
 			} else {
 				BEESCOUNT(dedup_miss);
-				BEESLOGWARN("NO Dedup! " << brp);
+				BEESLOGINFO("NO Dedup! " << brp);
 			}

 			lock.reset();
@@ -373,7 +375,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
 		Extent::OBSCURED | Extent::PREALLOC
 	)) {
 		BEESCOUNT(scan_interesting);
-		BEESLOGWARN("Interesting extent flags " << e << " from fd " << name_fd(bfr.fd()));
+		BEESLOGINFO("Interesting extent flags " << e << " from fd " << name_fd(bfr.fd()));
 	}

 	if (e.flags() & Extent::HOLE) {
@@ -385,7 +387,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
 	if (e.flags() & Extent::PREALLOC) {
 		// Prealloc is all zero and we replace it with a hole.
 		// No special handling is required here.  Nuke it and move on.
-		BEESLOGINFO("prealloc extent " << e);
+		BEESLOGINFO("prealloc extent " << e << " in " << bfr);
 		// Must not extend past EOF
 		auto extent_size = min(e.end(), bfr.file_size()) - e.begin();
 		// Must hold tmpfile until dedupe is done
@@ -534,7 +536,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)

 			// Hash is toxic
 			if (found_addr.is_toxic()) {
-				BEESLOGWARN("WORKAROUND: abandoned toxic match for hash " << hash << " addr " << found_addr << " matching bbd " << bbd);
+				BEESLOGDEBUG("WORKAROUND: abandoned toxic match for hash " << hash << " addr " << found_addr << " matching bbd " << bbd);
 				// Don't push these back in because we'll never delete them.
 				// Extents may become non-toxic so give them a chance to expire.
 				// hash_table->push_front_hash_addr(hash, found_addr);
@@ -556,7 +558,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
 			BeesResolver resolved(m_ctx, found_addr);
 			// Toxic extents are really toxic
 			if (resolved.is_toxic()) {
-				BEESLOGWARN("WORKAROUND: discovered toxic match at found_addr " << found_addr << " matching bbd " << bbd);
+				BEESLOGDEBUG("WORKAROUND: discovered toxic match at found_addr " << found_addr << " matching bbd " << bbd);
 				BEESCOUNT(scan_toxic_match);
 				// Make sure we never see this hash again.
 				// It has become toxic since it was inserted into the hash table.
@@ -917,7 +919,7 @@ BeesContext::scan_forward(const BeesFileRange &bfr_in)

 	// Sanity check
 	if (bfr.begin() >= bfr.file_size()) {
-		BEESLOGWARN("past EOF: " << bfr);
+		BEESLOGDEBUG("past EOF: " << bfr);
 		BEESCOUNT(scanf_eof);
 		return false;
 	}
--- a/src/bees-hash.cc
+++ b/src/bees-hash.cc
@@ -797,7 +797,7 @@ BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t
 	for (auto fp = madv_flags; fp->value; ++fp) {
 		BEESTOOLONG("madvise(" << fp->name << ")");
 		if (madvise(m_byte_ptr, m_size, fp->value)) {
-			BEESLOGWARN("madvise(..., " << fp->name << "): " << strerror(errno) << " (ignored)");
+			BEESLOGNOTICE("madvise(..., " << fp->name << "): " << strerror(errno) << " (ignored)");
 		}
 	}

@@ -811,8 +811,19 @@ BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t
 		prefetch_loop();
        });

-	// Blacklist might fail if the hash table is not stored on a btrfs
+	// Blacklist might fail if the hash table is not stored on a btrfs,
+	// or if it's on a _different_ btrfs
 	catch_all([&]() {
+		// Root is definitely a btrfs
+		BtrfsIoctlFsInfoArgs root_info;
+		root_info.do_ioctl(m_ctx->root_fd());
+		// Hash might not be a btrfs
+		BtrfsIoctlFsInfoArgs hash_info;
+		// If btrfs fs_info ioctl fails, it must be a different fs
+		if (!hash_info.do_ioctl_nothrow(m_fd)) return;
+		// If Hash is a btrfs, Root must be the same one
+		if (root_info.fsid() != hash_info.fsid()) return;
+		// Hash is on the same one, blacklist it
 		m_ctx->blacklist_insert(BeesFileId(m_fd));
 	});
 }
--- a/src/bees-roots.cc
+++ b/src/bees-roots.cc
--- a/src/bees-trace.cc
+++ b/src/bees-trace.cc
@@ -8,38 +8,32 @@ thread_local BeesTracer *BeesTracer::tl_next_tracer = nullptr;
 thread_local bool BeesTracer::tl_first = true;
 thread_local bool BeesTracer::tl_silent = false;

+bool
+exception_check()
+{
 #if __cplusplus >= 201703
-static
-bool
-exception_check()
-{
 	return uncaught_exceptions();
-}
 #else
-static
-bool
-exception_check()
-{
 	return uncaught_exception();
-}
 #endif
+}

 BeesTracer::~BeesTracer()
 {
 	if (!tl_silent && exception_check()) {
 		if (tl_first) {
-			BEESLOGNOTICE("--- BEGIN TRACE --- exception ---");
+			BEESLOG(BEES_TRACE_LEVEL, "TRACE: --- BEGIN TRACE --- exception ---");
 			tl_first = false;
 		}
 		try {
 			m_func();
 		} catch (exception &e) {
-			BEESLOGNOTICE("Nested exception: " << e.what());
+			BEESLOG(BEES_TRACE_LEVEL, "TRACE: Nested exception: " << e.what());
 		} catch (...) {
-			BEESLOGNOTICE("Nested exception ...");
+			BEESLOG(BEES_TRACE_LEVEL, "TRACE: Nested exception ...");
 		}
 		if (!m_next_tracer) {
-			BEESLOGNOTICE("---  END  TRACE --- exception ---");
+			BEESLOG(BEES_TRACE_LEVEL, "TRACE: ---  END  TRACE --- exception ---");
 		}
 	}
 	tl_next_tracer = m_next_tracer;
@@ -49,7 +43,7 @@ BeesTracer::~BeesTracer()
 	}
 }

-BeesTracer::BeesTracer(function<void()> f, bool silent) :
+BeesTracer::BeesTracer(const function<void()> &f, bool silent) :
 	m_func(f)
 {
 	m_next_tracer = tl_next_tracer;
@@ -61,12 +55,12 @@ void
 BeesTracer::trace_now()
 {
 	BeesTracer *tp = tl_next_tracer;
-	BEESLOGNOTICE("--- BEGIN TRACE ---");
+	BEESLOG(BEES_TRACE_LEVEL, "TRACE: --- BEGIN TRACE ---");
 	while (tp) {
 		tp->m_func();
 		tp = tp->m_next_tracer;
 	}
-	BEESLOGNOTICE("---  END  TRACE ---");
+	BEESLOG(BEES_TRACE_LEVEL, "TRACE: ---  END  TRACE ---");
 }

 bool
--- a/src/bees-types.cc
+++ b/src/bees-types.cc
@@ -457,7 +457,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
 			}
 		}
 		if (found_toxic) {
-			BEESLOGWARN("WORKAROUND: found toxic hash in " << first_bbd << " while extending backward:\n" << *this);
+			BEESLOGDEBUG("WORKAROUND: found toxic hash in " << first_bbd << " while extending backward:\n" << *this);
 			BEESCOUNT(pairbackward_toxic_hash);
 			break;
 		}
@@ -558,7 +558,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
 			}
 		}
 		if (found_toxic) {
-			BEESLOGWARN("WORKAROUND: found toxic hash in " << first_bbd << " while extending forward:\n" << *this);
+			BEESLOGDEBUG("WORKAROUND: found toxic hash in " << first_bbd << " while extending forward:\n" << *this);
 			BEESCOUNT(pairforward_toxic_hash);
 			break;
 		}
@@ -572,7 +572,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
 	}

 	if (first.overlaps(second)) {
-		BEESLOGTRACE("after grow, first " << first << "\n\toverlaps " << second);
+		BEESLOGDEBUG("after grow, first " << first << "\n\toverlaps " << second);
 		BEESCOUNT(bug_grow_pair_overlaps);
 	}

@@ -674,7 +674,7 @@ BeesAddress::magic_check(uint64_t flags)
 	static const unsigned recognized_flags = compressed_flags | delalloc_flags | ignore_flags | unusable_flags;

 	if (flags & ~recognized_flags) {
-		BEESLOGTRACE("Unrecognized flags in " << fiemap_extent_flags_ntoa(flags));
+		BEESLOGNOTICE("Unrecognized flags in " << fiemap_extent_flags_ntoa(flags));
 		m_addr = UNUSABLE;
 		// maybe we throw here?
 		BEESCOUNT(addr_unrecognized);
--- a/src/bees.cc
+++ b/src/bees.cc
@@ -4,6 +4,7 @@
 #include "crucible/process.h"
 #include "crucible/string.h"
 #include "crucible/task.h"
+#include "crucible/uname.h"

 #include <cctype>
 #include <cmath>
@@ -11,17 +12,19 @@

 #include <iostream>
 #include <memory>
+#include <regex>
 #include <sstream>

 // PRIx64
 #include <inttypes.h>

-#include <sched.h>
-#include <sys/fanotify.h>
-
 #include <linux/fs.h>
 #include <sys/ioctl.h>

+// statfs
+#include <linux/magic.h>
+#include <sys/statfs.h>
+
 // setrlimit
 #include <sys/time.h>
 #include <sys/resource.h>
@@ -198,7 +201,7 @@ BeesTooLong::check() const
 	if (age() > m_limit) {
 		ostringstream oss;
 		m_func(oss);
-		BEESLOGWARN("PERFORMANCE: " << *this << " sec: " << oss.str());
+		BEESLOGINFO("PERFORMANCE: " << *this << " sec: " << oss.str());
 	}
 }

@@ -246,10 +249,6 @@ bees_readahead_nolock(int const fd, const off_t offset, const size_t size)
 	Timer readahead_timer;
 	BEESNOTE("readahead " << name_fd(fd) << " offset " << to_hex(offset) << " len " << pretty(size));
 	BEESTOOLONG("readahead " << name_fd(fd) << " offset " << to_hex(offset) << " len " << pretty(size));
-#if 0
-	// In the kernel, readahead() is identical to posix_fadvise(..., POSIX_FADV_DONTNEED)
-	DIE_IF_NON_ZERO(readahead(fd, offset, size));
-#else
 	// Make sure this data is in page cache by brute force
 	// The btrfs kernel code does readahead with lower ioprio
 	// and might discard the readahead request entirely.
@@ -263,13 +262,16 @@ bees_readahead_nolock(int const fd, const off_t offset, const size_t size)
 		// Ignore errors and short reads.  It turns out our size
 		// parameter isn't all that accurate, so we can't use
 		// the pread_or_die template.
-		(void)!pread(fd, dummy, this_read_size, working_offset);
-		BEESCOUNT(readahead_count);
-		BEESCOUNTADD(readahead_bytes, this_read_size);
+		const auto pr_rv = pread(fd, dummy, this_read_size, working_offset);
+		if (pr_rv >= 0) {
+			BEESCOUNT(readahead_count);
+			BEESCOUNTADD(readahead_bytes, pr_rv);
+		} else {
+			BEESCOUNT(readahead_fail);
+		}
 		working_offset += this_read_size;
 		working_size -= this_read_size;
 	}
-#endif
 	BEESCOUNTADD(readahead_ms, readahead_timer.age() * 1000);
 }

@@ -392,6 +394,73 @@ BeesStringFile::read()
 	return read_string(fd, st.st_size);
 }

+static
+void
+bees_fsync(int const fd)
+{
+
+	// Note that when btrfs renames a temporary over an existing file,
+	// it flushes the temporary, so we get the right behavior if we
+	// just do nothing here (except when the file is first created;
+	// however, in that case the result is the same as if the file
+	// did not exist, was empty, or was filled with garbage).
+	//
+	// Kernel versions prior to 5.16 had bugs which would put ghost
+	// dirents in $BEESHOME if there was a crash when we called
+	// fsync() here.
+	//
+	// Some other filesystems will throw our data away if we don't
+	// call fsync, so we do need to call fsync() on those filesystems.
+	//
+	// Newer btrfs kernel versions rely on fsync() to report
+	// unrecoverable write errors.	If we don't check the fsync()
+	// result, we'll lose the data when we rename().  Kernel 6.2 added
+	// a number of new root causes for the class of "unrecoverable
+	// write errors" so we need to check this now.
+
+	BEESNOTE("checking filesystem type for " << name_fd(fd));
+	// LSB deprecated statfs without providing a replacement that
+	// can fill in the f_type field.
+	struct statfs stf = { 0 };
+	DIE_IF_NON_ZERO(fstatfs(fd, &stf));
+	if (stf.f_type != BTRFS_SUPER_MAGIC) {
+		BEESLOGONCE("Using fsync on non-btrfs filesystem type " << to_hex(stf.f_type));
+		BEESNOTE("fsync non-btrfs " << name_fd(fd));
+		DIE_IF_NON_ZERO(fsync(fd));
+		return;
+	}
+
+	static bool did_uname = false;
+	static bool do_fsync = false;
+
+	if (!did_uname) {
+		Uname uname;
+		const string version(uname.release);
+		static const regex version_re(R"/(^(\d+)\.(\d+)\.)/", regex::optimize | regex::ECMAScript);
+		smatch m;
+		// Last known bug in the fsync-rename use case was fixed in kernel 5.16
+		static const auto min_major = 5, min_minor = 16;
+		if (regex_search(version, m, version_re)) {
+			const auto major = stoul(m[1]);
+			const auto minor = stoul(m[2]);
+			if (tie(major, minor) > tie(min_major, min_minor)) {
+				BEESLOGONCE("Using fsync on btrfs because kernel version is " << major << "." << minor);
+				do_fsync = true;
+			} else {
+				BEESLOGONCE("Not using fsync on btrfs because kernel version is " << major << "." << minor);
+			}
+		} else {
+			BEESLOGONCE("Not using fsync on btrfs because can't parse kernel version '" << version << "'");
+		}
+		did_uname = true;
+	}
+
+	if (do_fsync) {
+		BEESNOTE("fsync btrfs " << name_fd(fd));
+		DIE_IF_NON_ZERO(fsync(fd));
+	}
+}
+
 void
 BeesStringFile::write(string contents)
 {
@@ -407,19 +476,8 @@ BeesStringFile::write(string contents)
 		Fd ofd = openat_or_die(m_dir_fd, tmpname, FLAGS_CREATE_FILE, S_IRUSR | S_IWUSR);
 		BEESNOTE("writing " << tmpname << " in " << name_fd(m_dir_fd));
 		write_or_die(ofd, contents);
-#if 0
-		// This triggers too many btrfs bugs.  I wish I was kidding.
-		// Forget snapshots, balance, compression, and dedupe:
-		// the system call you have to fear on btrfs is fsync().
-		// Also note that when bees renames a temporary over an
-		// existing file, it flushes the temporary, so we get
-		// the right behavior if we just do nothing here
-		// (except when the file is first created; however,
-		// in that case the result is the same as if the file
-		// did not exist, was empty, or was filled with garbage).
 		BEESNOTE("fsyncing " << tmpname << " in " << name_fd(m_dir_fd));
-		DIE_IF_NON_ZERO(fsync(ofd));
-#endif
+		bees_fsync(ofd);
 	}
 	BEESNOTE("renaming " << tmpname << " to " << m_name << " in FD " << name_fd(m_dir_fd));
 	BEESTRACE("renaming " << tmpname << " to " << m_name << " in FD " << name_fd(m_dir_fd));
@@ -444,6 +502,19 @@ BeesTempFile::resize(off_t offset)
 	// Count time spent here
 	BEESCOUNTADD(tmp_resize_ms, resize_timer.age() * 1000);

+	// Modify flags - every time
+	// - btrfs will keep trying to set FS_NOCOMP_FL behind us when compression heuristics identify
+	//   the data as compressible, but it fails to compress
+	// - clear FS_NOCOW_FL because we can only dedupe between files with the same FS_NOCOW_FL state,
+	//   and we don't open FS_NOCOW_FL files for dedupe.
+	BEESTRACE("Getting FS_COMPR_FL and FS_NOCOMP_FL on m_fd " << name_fd(m_fd));
+	int flags = ioctl_iflags_get(m_fd);
+	flags |= FS_COMPR_FL;
+	flags &= ~(FS_NOCOMP_FL | FS_NOCOW_FL);
+	BEESTRACE("Setting FS_COMPR_FL and clearing FS_NOCOMP_FL | FS_NOCOW_FL on m_fd " << name_fd(m_fd) << " flags " << to_hex(flags));
+	ioctl_iflags_set(m_fd, flags);
+
+	// That may have queued some delayed ref deletes, so throttle them
 	bees_throttle(resize_timer.age(), "tmpfile_resize");
 }

@@ -485,13 +556,6 @@ BeesTempFile::BeesTempFile(shared_ptr<BeesContext> ctx) :
 	// Add this file to open_root_ino lookup table
 	m_roots->insert_tmpfile(m_fd);

-	// Set compression attribute
-	BEESTRACE("Getting FS_COMPR_FL on m_fd " << name_fd(m_fd));
-	int flags = ioctl_iflags_get(m_fd);
-	flags |= FS_COMPR_FL;
-	BEESTRACE("Setting FS_COMPR_FL on m_fd " << name_fd(m_fd) << " flags " << to_hex(flags));
-	ioctl_iflags_set(m_fd, flags);
-
 	// Count time spent here
 	BEESCOUNTADD(tmp_create_ms, create_timer.age() * 1000);

@@ -683,7 +747,7 @@ bees_main(int argc, char *argv[])
 			BEESLOGDEBUG("exception (ignored): " << s);
 			BEESCOUNT(exception_caught_silent);
 		} else {
-			BEESLOGNOTICE("\n\n*** EXCEPTION ***\n\t" << s << "\n***\n");
+			BEESLOG(BEES_TRACE_LEVEL, "TRACE: EXCEPTION: " << s);
 			BEESCOUNT(exception_caught);
 		}
 	});
@@ -704,9 +768,8 @@ bees_main(int argc, char *argv[])
 	shared_ptr<BeesContext> bc = make_shared<BeesContext>();
 	BEESLOGDEBUG("context constructed");

-	string cwd(readlink_or_die("/proc/self/cwd"));
-
 	// Defaults
+	bool use_relative_paths = false;
 	bool chatter_prefix_timestamp = true;
 	double thread_factor = 0;
 	unsigned thread_count = 0;
@@ -778,7 +841,7 @@ bees_main(int argc, char *argv[])
 				thread_min = stoul(optarg);
 				break;
 			case 'P':
-				crucible::set_relative_path(cwd);
+				use_relative_paths = true;
 				break;
 			case 'T':
 				chatter_prefix_timestamp = false;
@@ -796,7 +859,7 @@ bees_main(int argc, char *argv[])
 				root_scan_mode = static_cast<BeesRoots::ScanMode>(stoul(optarg));
 				break;
 			case 'p':
-				crucible::set_relative_path("");
+				use_relative_paths = false;
 				break;
 			case 't':
 				chatter_prefix_timestamp = true;
@@ -866,18 +929,19 @@ bees_main(int argc, char *argv[])
 	BEESLOGNOTICE("setting root path to '" << root_path << "'");
 	bc->set_root_path(root_path);

+	// Set path prefix
+	if (use_relative_paths) {
+		crucible::set_relative_path(name_fd(bc->root_fd()));
+	}
+
 	// Workaround for btrfs send
 	bc->roots()->set_workaround_btrfs_send(workaround_btrfs_send);

 	// Set root scan mode
 	bc->roots()->set_scan_mode(root_scan_mode);

-	if (root_scan_mode == BeesRoots::SCAN_MODE_EXTENT) {
-		MultiLocker::enable_locking(false);
-	} else {
-		// Workaround for a kernel bug that the subvol-based crawlers keep triggering
-		MultiLocker::enable_locking(true);
-	}
+	// Workaround for the logical-ino-vs-clone kernel bug
+	MultiLocker::enable_locking(true);

 	// Start crawlers
 	bc->start();
--- a/src/bees.h
+++ b/src/bees.h
@@ -122,9 +122,9 @@ const int FLAGS_OPEN_FANOTIFY = O_RDWR | O_NOATIME | O_CLOEXEC | O_LARGEFILE;
 // macros ----------------------------------------

 #define BEESLOG(lv,x)   do { if (lv < bees_log_level) { Chatter __chatter(lv, BeesNote::get_name()); __chatter << x; } } while (0)
-#define BEESLOGTRACE(x) do { BEESLOG(LOG_DEBUG, x); BeesTracer::trace_now(); } while (0)

-#define BEESTRACE(x)   BeesTracer  SRSLY_WTF_C(beesTracer_,  __LINE__) ([&]()                 { BEESLOG(LOG_ERR, x << " at " << __FILE__ << ":" << __LINE__);   })
+#define BEES_TRACE_LEVEL LOG_DEBUG
+#define BEESTRACE(x)   BeesTracer  SRSLY_WTF_C(beesTracer_,  __LINE__) ([&]()                 { BEESLOG(BEES_TRACE_LEVEL, "TRACE: " << x << " at " << __FILE__ << ":" << __LINE__);   })
 #define BEESTOOLONG(x) BeesTooLong SRSLY_WTF_C(beesTooLong_, __LINE__) ([&](ostream &_btl_os) { _btl_os << x; })
 #define BEESNOTE(x)    BeesNote    SRSLY_WTF_C(beesNote_,    __LINE__) ([&](ostream &_btl_os) { _btl_os << x; })

@@ -134,6 +134,14 @@ const int FLAGS_OPEN_FANOTIFY = O_RDWR | O_NOATIME | O_CLOEXEC | O_LARGEFILE;
 #define BEESLOGINFO(x)   BEESLOG(LOG_INFO, x)
 #define BEESLOGDEBUG(x)  BEESLOG(LOG_DEBUG, x)

+#define BEESLOGONCE(__x) do { \
+        static bool already_logged = false; \
+        if (!already_logged) { \
+                already_logged = true; \
+                BEESLOGNOTICE(__x); \
+        } \
+} while (false)
+
 #define BEESCOUNT(stat) do { \
 	BeesStats::s_global.add_count(#stat); \
 } while (0)
@@ -185,7 +193,7 @@ class BeesTracer {
 	thread_local static bool tl_silent;
 	thread_local static bool tl_first;
 public:
-	BeesTracer(function<void()> f, bool silent = false);
+	BeesTracer(const function<void()> &f, bool silent = false);
 	~BeesTracer();
 	static void trace_now();
 	static bool get_silent();
@@ -521,7 +529,7 @@ class BeesCrawl {

 	bool fetch_extents();
 	void fetch_extents_harder();
-	bool restart_crawl();
+	bool restart_crawl_unlocked();
 	BeesFileRange bti_to_bfr(const BtrfsTreeItem &bti) const;

 public:
@@ -535,6 +543,7 @@ public:
 	void deferred(bool def_setting);
 	bool deferred() const;
 	bool finished() const;
+	bool restart_crawl();
 };

 class BeesScanMode;
@@ -543,7 +552,8 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
 	shared_ptr<BeesContext>			m_ctx;

 	BeesStringFile				m_crawl_state_file;
-	map<uint64_t, shared_ptr<BeesCrawl>>	m_root_crawl_map;
+	using CrawlMap = map<uint64_t, shared_ptr<BeesCrawl>>;
+	CrawlMap				m_root_crawl_map;
 	mutex					m_mutex;
 	uint64_t				m_crawl_dirty = 0;
 	uint64_t				m_crawl_clean = 0;
@@ -562,7 +572,7 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
 	condition_variable			m_stop_condvar;
 	bool					m_stop_requested = false;

-	void insert_new_crawl();
+	CrawlMap insert_new_crawl();
 	Fd open_root_nocache(uint64_t root);
 	Fd open_root_ino_nocache(uint64_t root, uint64_t ino);
 	uint64_t transid_max_nocache();
@@ -578,13 +588,14 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
 	void current_state_set(const BeesCrawlState &bcs);
 	bool crawl_batch(shared_ptr<BeesCrawl> crawl);
 	void clear_caches();
-
-friend class BeesScanModeExtent;
 	shared_ptr<BeesCrawl> insert_root(const BeesCrawlState &bcs);
+	bool up_to_date(const BeesCrawlState &bcs);

 friend class BeesCrawl;
 friend class BeesFdCache;
 friend class BeesScanMode;
+friend class BeesScanModeSubvol;
+friend class BeesScanModeExtent;

 public:
 	BeesRoots(shared_ptr<BeesContext> ctx);
@@ -890,5 +901,6 @@ void bees_readahead_pair(int fd, off_t offset, size_t size, int fd2, off_t offse
 void bees_unreadahead(int fd, off_t offset, size_t size);
 void bees_throttle(double time_used, const char *context);
 string format_time(time_t t);
+bool exception_check();

 #endif
--- a/test/seeker.cc
+++ b/test/seeker.cc
@@ -19,7 +19,9 @@ seeker_finder(const vector<uint64_t> &vec, uint64_t lower, uint64_t upper)
 	if (ub != s.end()) ++ub;
 	if (ub != s.end()) ++ub;
 	for (; ub != s.end(); ++ub) {
-		if (*ub > upper) break;
+		if (*ub > upper) {
+			break;
+		}
 	}
 	return set<uint64_t>(lb, ub);
 }
@@ -28,7 +30,7 @@ static bool test_fails = false;

 static
 void
-seeker_test(const vector<uint64_t> &vec, uint64_t const target)
+seeker_test(const vector<uint64_t> &vec, uint64_t const target, bool const always_out = false)
 {
 	cerr << "Find " << target << " in {";
 	for (auto i : vec) {
@@ -36,11 +38,13 @@ seeker_test(const vector<uint64_t> &vec, uint64_t const target)
 	}
 	cerr << " } = ";
 	size_t loops = 0;
+	tl_seeker_debug_str = make_shared<ostringstream>();
+	bool local_test_fails = false;
 	bool excepted = catch_all([&]() {
-		auto found = seek_backward(target, [&](uint64_t lower, uint64_t upper) {
+		const auto found = seek_backward(target, [&](uint64_t lower, uint64_t upper) {
 			++loops;
 			return seeker_finder(vec, lower, upper);
-		});
+		}, uint64_t(32));
 		cerr << found;
 		uint64_t my_found = 0;
 		for (auto i : vec) {
@@ -52,13 +56,15 @@ seeker_test(const vector<uint64_t> &vec, uint64_t const target)
 			cerr << " (correct)";
 		} else {
 			cerr << " (INCORRECT - right answer is " << my_found << ")";
-			test_fails = true;
+			local_test_fails = true;
 		}
 	});
 	cerr << " (" << loops << " loops)" << endl;
-	if (excepted) {
-		test_fails = true;
+	if (excepted || local_test_fails || always_out) {
+		cerr << dynamic_pointer_cast<ostringstream>(tl_seeker_debug_str)->str();
 	}
+	test_fails = test_fails || local_test_fails;
+	tl_seeker_debug_str.reset();
 }

 static
@@ -89,6 +95,39 @@ test_seeker()
 	seeker_test(vector<uint64_t> { 0, numeric_limits<uint64_t>::max() }, numeric_limits<uint64_t>::max());
 	seeker_test(vector<uint64_t> { 0, numeric_limits<uint64_t>::max() }, numeric_limits<uint64_t>::max() - 1);
 	seeker_test(vector<uint64_t> { 0, numeric_limits<uint64_t>::max() - 1 }, numeric_limits<uint64_t>::max());
+
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 0);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 1);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 2);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 3);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 4);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 5);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 6);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 7);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 8);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 9);
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 1 );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 2 );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 3 );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 4 );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 5 );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 6 );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 7 );
+	seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 8 );
+
+	// Pulled from a bees debug log
+	seeker_test(vector<uint64_t> {
+		6821962845,
+		6821962848,
+		6821963411,
+		6821963422,
+		6821963536,
+		6821963539,
+		6821963835, // <- appeared during the search, causing an exception
+		6821963841,
+		6822575316,
+	}, 6821971036, true);
 }