1
0
mirror of https://github.com/Zygo/bees.git synced 2025-08-01 13:23:28 +02:00

32 Commits

Author SHA1 Message Date
Zygo Blaxell
ba11d733c0 readahead: flush the readahead cache based on time, not extent count
If the extent wasn't read in the last second, chances are high that
it was evicted from the page cache.  If the extents have been evicted
from the cache by the time we grow or dedupe them, we'll take a serious
performance hit as we read them back in, one page at a time.

Use a 5-second delay to match the default writeback interval.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-22 00:06:11 -04:00
Zygo Blaxell
e87f6e9649 readahead: ignore large and unproductive readahead requests
Sometimes there are absurdly large readahead requests (e.g. 32G),
which tie up a thread holding the readahead lock for a long time (not
to mention the IO the reading hammers the rest of the system with).

These are likely an artifact of the legacy ExtentWalker code interacting
with concurrent filesystem changes.

The maximum btrfs extent size is 128M, so cap the length of readahead
requests at that size.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
fb63bd7e06 c++20: Implicit value sharing of this is deprecated in C++20
Fix the handful of instances.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit 4d6b21fb40174c3ecdc9e97670dae0dd22ce74a6)
2025-07-21 21:21:54 -04:00
Zygo Blaxell
27b5b4e113 roots: filter out NODATASUM files before attempting to scan them
Add a cheap check for `FS_NOCOW_FL` when we first encounter
each extent.  In the raw btrfs inode flags, the offending flag is
`BTRFS_INODE_NODATASUM`, because the restriction that prevents reflink
between datacow and "nodatacow" files is that a single inode is allowed
to have csums or not have csums, but must apply that choice to _all_
of its extents.

This extra check is cheaper than opening a file for each individual
reference to the extent, and then discovering that the file is
`FS_NOCOW_FL`, and then closing the file, over and over again.  It will
also avoid emitting a lot of noisy log messages.

Fixes: https://github.com/Zygo/bees/issues/313
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
e9e6870de8 fs: add btrfs_inode_flags_ntoa
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
16e3dd7f60 btrfs: copy BTRFS_INODE_* flags to build on linux-libc-dev < 6.2
Yet another "this will build on every environment but yours" change.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
c658831852 btrfs-tree: add support for inode flags
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
e852e3998a openat2: LINUX_VERSION_CODE is defined by linux-libc-dev, not libc
With new kernel headers and old libc, `SYS_openat2` can still end up
undefined, which triggers the fallback build-time code, that doesn't build:

```
openat2.cc: In function 'int openat2(int, const char*, open_how*, size_t)':
openat2.cc:35:2: error: 'errno' was not declared in this scope
   35 |  errno = ENOSYS;
      |  ^~~~~
openat2.cc:24:1: note: 'errno' is defined in header '<cerrno>'; did you forget to '#include <cerrno>'?
   23 | #include <unistd.h>
  +++ |+#include <cerrno>
   24 |
openat2.cc:35:10: error: 'ENOSYS' was not declared in this scope
   35 |  errno = ENOSYS;
      |          ^~~~~~
openat2.cc:29:19: error: unused parameter 'dirfd' [-Werror=unused-parameter]
   29 | openat2(int const dirfd, const char *const pathname, struct open_how *const how, size_t const size)
      |         ~~~~~~~~~~^~~~~
openat2.cc:29:44: error: unused parameter 'pathname' [-Werror=unused-parameter]
   29 | openat2(int const dirfd, const char *const pathname, struct open_how *const how, size_t const size)
      |                          ~~~~~~~~~~~~~~~~~~^~~~~~~~
openat2.cc:29:77: error: unused parameter 'how' [-Werror=unused-parameter]
   29 | t dirfd, const char *const pathname, struct open_how *const how, size_t const size)
      |                                      ~~~~~~~~~~~~~~~~~~~~~~~^~~

openat2.cc:29:95: error: unused parameter 'size' [-Werror=unused-parameter]
   29 | st char *const pathname, struct open_how *const how, size_t const size)
      |                                                      ~~~~~~~~~~~~~^~~~
```

Skip the kernel version check and test for the definition of `SYS_openat2`
directly.  If it's not there, plug in the constant so we can send the
call directly to the kernel, bypassing libc completely.

Fixes: https://github.com/Zygo/bees/issues/318
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
5c0480ec59 progress: calculate point along the range 000000..999999 to avoid 7-digit columns
With the "idle" tag moved out of the `point` column, a `point` value of
1000000 may become visible--and push the table one column to the right.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
1b8b7557b6 progress: base progress estimates on queued extents, not completed ones
This means the progress table in the status output reflects the state of
the oldest task in the queue, not the newest.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Kyle Gospodnetich
f9f3913c8b Add configurable bindir for distros without sbin
Adds a `BINDIR` Make variable, defaulting to `sbin`, allowing packagers
to override the install location of `beesd` for systems that do not use
`/sbin`.  This affects the install path and systemd unit template.
2025-07-04 19:44:35 -04:00
Zygo Blaxell
ee5c971d77 fsync: fix signed comparison of stf.f_type
Build fails on 32-bit Slackware because GCC 11's `-Werror=sign-compare`
is stricter than necessary:

	cc -Wall -Wextra -Werror -O3 -I../include -D_FILE_OFFSET_BITS=64 -std=c99 -O2 -march=i586 -mtune=i686 -o bees-version.o -c bees-version.c
	bees.cc: In function 'void bees_fsync(int)':
	bees.cc:426:24: error: comparison of integer expressions of different signedness: '__fsword_t' {aka 'int'} and 'unsigned int' [-Werror=sign-compare]
	  426 |         if (stf.f_type != BTRFS_SUPER_MAGIC) {
	      |                        ^

To work around this, cast `stf.f_type` to the same type as
`BTRFS_SUPER_MAGIC`, so it has the same number of bits that we're looking
for in the magic value.

Fixes: https://github.com/Zygo/bees/issues/317
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-03 21:48:40 -04:00
Zygo Blaxell
d37f916507 tempfile: don't need to update the inode if the flags don't change
A small performance optimization, given that we are constantly clobbering
the file with new content.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-29 23:34:10 -04:00
Zygo Blaxell
3a17a4dcdd tempfile: make sure FS_COMPR_FL stays set
btrfs will set the FS_NOCOMP_FL flag when all of the following are true:

1.  The filesystem is not mounted with the `compress-force` option
2.  Heuristic analysis of the data suggests the data is compressible
3.  Compression fails to produce a result that is smaller than the original

If the compression ratio is 40%, and the original data is 128K long,
then compressed data will be about 52K long (rounded up to 4K), so item
3 is usually false; however, if the original data is 8K long, then the
compressed data will be 8K long too, and btrfs will set FS_NOCOMP_FL.

To work around that, keep setting FS_COMPR_FL and clearing FS_NOCOMP_FL
every time a TempFile is reset.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-29 23:25:36 -04:00
Zygo Blaxell
4039ef229e tempfile: clear FS_NOCOW_FL while setting FS_COMPR_FL
FS_NOCOW_FL can be inherited from the subvol root directory, and it
conflicts with FS_COMPR_FL.

We can only dedupe when FS_NOCOW_FL is the same on src and dst, which
means we can only dedupe when FS_NOCOW_FL is clear, so we should clear
FS_NOCOW_FL on the temporary files we create for dedupe.

Fixes: https://github.com/Zygo/bees/issues/314
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-29 23:24:55 -04:00
Zygo Blaxell
e9d4aa4586 roots: make the "idle" label useful
Apply the "idle" label only when the crawl is finished _and_ its
transid_max is up to date.  This makes the keyword "idle" better reflect
when bees is not only finished crawling, but also scanning the crawled
extents in the queue.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 23:06:14 -04:00
Zygo Blaxell
504f4cda80 progress: move the "idle" cell to the next cycle ETA column
When all extents within a size tier have been queued, and all the
extents belong to the same file, the queue might take a long time to
fully process.  Also, any progress that is made will be obscured by
the "idle" tag in the "point" column.

Move "idle" to the next cycle ETA column, since the ETA duration will
be zero, and no useful information is lost since we would have "-"
there anyway.

Since the "point" column can now display the maximum value, lower
that maximum to 999999 so that we don't use an extra column.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 22:33:05 -04:00
Zygo Blaxell
6c36f4973f extent scan: log the bfr when removing a prealloc extent
With subvol scan, the crawl task name is the subvol/inode pair
corresponding to the file offset in the log message.  The identity of
the file can be determined by looking up the subvol/inode pair in the
log message.

With extent scan, the crawl task name is the extent bytenr corresponding
to the file offset in the log message.  This extent is deleted when the
log message is emitted, so a later lookup on the extent bytenr will not
find any references to the extent, and the identity of the file cannot
be determined.

Log the bfr, which does a /proc lookup on the name of the fd, so the
filename is logged.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 22:33:05 -04:00
Zygo Blaxell
b1bd99c077 seeker: harden against changes in the data during binary search
During the search, the region between `upper_bound` and `target_pos`
should contain no data items.  The search lowers `upper_bound` and raises
`lower_bound` until they both point to the last item before `target_pos`.

The `lower_bound` is increased to the position of the last item returned
by a search (`high_pos`) when that item is lower than `target_pos`.
This avoids some loop iterations compared to a strict binary search
algorithm, which would increase `lower_bound` only as far as `probe_pos`.

When the search runs over live extent items, occasionally a new extent
will appear between `upper_bound` and `target_pos`.  When this happens,
`lower_bound` is bumped up to the position of one of the new items, but
that position is in the "unoccupied" space between `upper_bound` and
`target_pos`, where no items are supposed to exist, so `seek_backward`
throws an exception.

To cut down on the noise, only increase `lower_bound` as far as
`upper_bound`.  This avoids the exception without increasing the number
of loop iterations for normal cases.

In the exceptional cases, extra loop iterations are needed to skip over
the new items.  This raises the worst-case number of loop iterations
by one.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
d5e805ab8d seeker: add a real-world test case
This seek_backward failed in bees because an extent appeared during
the search:

	fetch(probe_pos = 6821971036, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821971004 = probe_pos - have_delta 32 (want_delta 32)
	fetch(probe_pos = 6821971004, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821970972 = probe_pos - have_delta 32 (want_delta 32)
	fetch(probe_pos = 6821970972, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821970908 = probe_pos - have_delta 64 (want_delta 64)
	fetch(probe_pos = 6821970908, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821970780 = probe_pos - have_delta 128 (want_delta 128)
	fetch(probe_pos = 6821970780, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821970524 = probe_pos - have_delta 256 (want_delta 256)
	fetch(probe_pos = 6821970524, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821970012 = probe_pos - have_delta 512 (want_delta 512)
	fetch(probe_pos = 6821970012, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821968988 = probe_pos - have_delta 1024 (want_delta 1024)
	fetch(probe_pos = 6821968988, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821966940 = probe_pos - have_delta 2048 (want_delta 2048)
	fetch(probe_pos = 6821966940, target_pos = 6821971036)
	 = 6822575316..6822575316
	probe_pos 6821962844 = probe_pos - have_delta 4096 (want_delta 4096)
	fetch(probe_pos = 6821962844, target_pos = 6821971036)
	 = 6821962845..6821962848
	found_low = true, lower_bound = 6821962845
	lower_bound = high_pos 6821962848
	loop: lower_bound 6821962848, probe_pos 6821966942, upper_bound 6821971036
	fetch(probe_pos = 6821966942, target_pos = 6821971036)
	 = 6822575316..6822575316
	upper_bound = probe_pos 6821966942
	loop: lower_bound 6821962848, probe_pos 6821964895, upper_bound 6821966942
	fetch(probe_pos = 6821964895, target_pos = 6821971036)
	 = 6822575316..6822575316
	upper_bound = probe_pos 6821964895
	loop: lower_bound 6821962848, probe_pos 6821963871, upper_bound 6821964895
	fetch(probe_pos = 6821963871, target_pos = 6821971036)
	 = 6822575316..6822575316
	upper_bound = probe_pos 6821963871
	loop: lower_bound 6821962848, probe_pos 6821963359, upper_bound 6821963871
	fetch(probe_pos = 6821963359, target_pos = 6821971036)
	 = 6821963411..6821963422
	lower_bound = high_pos 6821963422
	loop: lower_bound 6821963422, probe_pos 6821963646, upper_bound 6821963871
	fetch(probe_pos = 6821963646, target_pos = 6821971036)
	 = 6822575316..6822575316

Here, we found nothing between 6821963646 and 6822575316, so upper_bound is reduced
to 6821963646...

	upper_bound = probe_pos 6821963646
	loop: lower_bound 6821963422, probe_pos 6821963534, upper_bound 6821963646
	fetch(probe_pos = 6821963534, target_pos = 6821971036)
	 = 6821963536..6821963539
	lower_bound = high_pos 6821963539
	loop: lower_bound 6821963539, probe_pos 6821963592, upper_bound 6821963646
	fetch(probe_pos = 6821963592, target_pos = 6821971036)
	 = 6821963835..6821963841

...but here, we found 6821963835 and 6821963841, which are between
6821963646 and 6822575316.  They were not there before, so the binary
search result is now invalid because new extent items were added while
it was running.  This results in an exception:

	lower_bound = high_pos 6821963841
	--- BEGIN TRACE --- exception ---
	objectid = 27942759813120, adjusted to 27942793363456 at bees-roots.cc:1103
	Crawling extent BeesCrawlState 250:0 offset 0x0 transid 1311734..1311735 at bees-roots.cc:991
	get_state_end at bees-roots.cc:988
	find_next_extent 250 at bees-roots.cc:929
	---  END  TRACE --- exception ---
	*** EXCEPTION ***
	exception type std::out_of_range: lower_bound = 6821963841, upper_bound = 6821963646 failed constraint check (lower_bound <= upper_bound) at ../include/crucible/seeker.h:139

The exception prevents the result of seek_backward from returning a value,
which prevents a nonsense result from a consumer of that value.

Copy the details of this search into a test case.  Note that the test
case won't reproduce the exception because the simulation of fetch()
is not changing the results part way through.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
337bbffac1 extent scan: drop a nonsense trace message
This message appears only during exception backtraces, but it doesn't
carry any useful information.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
527396e5cb extent scan: integrate seeker debug output stream
Send both tree_search ioctl and `seek_backward` debug logs to the
same output stream, but only write that stream to the debug log if
there is an exception.

The feature remains disabled at compile time.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
bc7c35aa2d extent scan: only write a detailed debug log when there's an exception
Note that when enabled, the logs are still very CPU-intensive,
but most of the logs will be discarded.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
0953160584 trace: export exception_check
We need to call this from more than one place in bees.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
80f9c147f7 btrfs-tree: clean up the fetch function's return set
Commit d32f31f411 ("btrfs-tree: harden
`rlower_bound` against exceptional objects") passes the first btrfs item
in the result set that is above upper_bound up to `seek_backward`.
This is somewhat wasteful as `seek_backward` cannot use such a result.

Reverse that change in behavior, while keeping the rest of the other
commit.

This introduces a new case, where the search ioctl is producing items
that are above upper bound, but there are no items in the result set,
which continues looping until the end of the filesystem is reached.
Handle that by setting an explicit exit variable.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
50e012ad6d seeker: add a runtime debug stream
This allows detailed but selective debugging when using the library,
particularly when something goes wrong.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
9a9644659c trace: clean up the formatting around top-level exception log messages
Fewer newlines.  More consistent application of the "TRACE:" prefix.
All at the same log level.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
fd53bff959 extent scan: drop out-of-date comment
The comment describes an earlier version which submitted each extent
ref as a separate Task, but now all extent refs are handled by the same
Task to minimize the amount of time between processing the first and
last reference to an extent.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
9439dad93a extent scan: extra check to make sure no Tasks are started when throttled
Previously `scan()` would run the extent scan loop once, and enqueue one
extent, before checking for throttling.  Do an extra check before that,
and bail out so that zero extents are enqueued when throttled.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
ef9b4b3a50 extent scan: shorten task name for extent map
Linux kernel thread names are hardcoded at 16 characters.  Every character
counts, and "0x" wastes two.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
7ca857dff0 docs: add the ghost subvols bug to the bugs list
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
Zygo Blaxell
8331f70db7 progress: fix ETA calculations
The "tm_left" field was the estimated _total_ duration of the crawl,
not the amount of time remaining.  The ETA timestamp was then calculated
based on the estimated time to run the crawl if it started _now_, not
at the start timestamp.

Fix the duration and ETA calculations.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-06-18 21:17:48 -04:00
21 changed files with 349 additions and 207 deletions

View File

@@ -4,6 +4,7 @@ define TEMPLATE_COMPILER =
sed $< >$@ \
-e's#@DESTDIR@#$(DESTDIR)#' \
-e's#@PREFIX@#$(PREFIX)#' \
-e's#@BINDIR@#$(BINDIR)#' \
-e's#@ETC_PREFIX@#$(ETC_PREFIX)#' \
-e's#@LIBEXEC_PREFIX@#$(LIBEXEC_PREFIX)#'
endef

View File

@@ -1,6 +1,7 @@
PREFIX ?= /usr
ETC_PREFIX ?= /etc
LIBDIR ?= lib
BINDIR ?= sbin
LIB_PREFIX ?= $(PREFIX)/$(LIBDIR)
LIBEXEC_PREFIX ?= $(LIB_PREFIX)/bees
@@ -55,7 +56,7 @@ install_bees: src $(RUN_INSTALL_TESTS)
install_scripts: ## Install scipts
install_scripts: scripts
install -Dm755 scripts/beesd $(DESTDIR)$(PREFIX)/sbin/beesd
install -Dm755 scripts/beesd $(DESTDIR)$(PREFIX)/$(BINDIR)/beesd
install -Dm644 scripts/beesd.conf.sample $(DESTDIR)$(ETC_PREFIX)/bees/beesd.conf.sample
ifneq ($(SYSTEMD_SYSTEM_UNIT_DIR),)
install -Dm644 scripts/beesd@.service $(DESTDIR)$(SYSTEMD_SYSTEM_UNIT_DIR)/beesd@.service

View File

@@ -55,6 +55,7 @@ These bugs are particularly popular among bees users, though not all are specifi
| 5.4 | 5.11 | spurious tree checker failures on extent ref hash | 5.4.125, 5.10.43, 5.11.5, 5.12 and later | 1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match
| - | 5.11 | tree mod log issue #5 | 4.4.263, 4.9.263, 4.14.227, 4.19.183, 5.4.108, 5.10.26, 5.11.9, 5.12 and later | dbcc7d57bffc btrfs: fix race when cloning extent buffer during rewind of an old root
| - | 5.12 | tree mod log issue #6 | 4.14.233, 4.19.191, 5.4.118, 5.10.36, 5.11.20, 5.12.3, 5.13 and later | f9690f426b21 btrfs: fix race when picking most recent mod log operation for an old root
| 5.11 | 5.12 | subvols marked for deletion with `btrfs sub del` become permanently undeletable ("ghost" subvols) | 5.12 stopped creation of new ghost subvols | Partially fixed in 8d488a8c7ba2 btrfs: fix subvolume/snapshot deletion not triggered on mount. Qu wrote a [patch](https://github.com/adam900710/linux/commit/9de990fcc8864c376eb28aa7482c54321f94acd4) to allow `btrfs sub del -i` to remove "ghost" subvols, but it was never merged upstream.
| 4.15 | 5.16 | spurious warnings from `fs/fs-writeback.c` when `flushoncommit` is enabled | 5.15.27, 5.16.13, 5.17 and later | a0f0cf8341e3 btrfs: get rid of warning on transaction commit when using flushoncommit
| - | 5.17 | crash during device removal can make filesystem unmountable | 5.15.54, 5.16.20, 5.17.3, 5.18 and later | bbac58698a55 btrfs: remove device item and update super block in the same transaction
| - | 5.18 | wrong superblock num_devices makes filesystem unmountable | 4.14.283, 4.19.247, 5.4.198, 5.10.121, 5.15.46, 5.17.14, 5.18.3, 5.19 and later | d201238ccd2f btrfs: repair super block num_devices automatically

View File

@@ -49,6 +49,7 @@ namespace crucible {
/// @}
/// @{ Inode items
uint64_t inode_flags() const;
uint64_t inode_size() const;
/// @}

View File

@@ -91,7 +91,23 @@ enum btrfs_compression_type {
#define BTRFS_UUID_KEY_SUBVOL 251
#define BTRFS_UUID_KEY_RECEIVED_SUBVOL 252
#define BTRFS_STRING_ITEM_KEY 253
#endif
// BTRFS_INODE_* was added to include/uapi/btrfs_tree.h in v6.2-rc1
#ifndef BTRFS_INODE_NODATASUM
#define BTRFS_INODE_NODATASUM (1U << 0)
#define BTRFS_INODE_NODATACOW (1U << 1)
#define BTRFS_INODE_READONLY (1U << 2)
#define BTRFS_INODE_NOCOMPRESS (1U << 3)
#define BTRFS_INODE_PREALLOC (1U << 4)
#define BTRFS_INODE_SYNC (1U << 5)
#define BTRFS_INODE_IMMUTABLE (1U << 6)
#define BTRFS_INODE_APPEND (1U << 7)
#define BTRFS_INODE_NODUMP (1U << 8)
#define BTRFS_INODE_NOATIME (1U << 9)
#define BTRFS_INODE_DIRSYNC (1U << 10)
#define BTRFS_INODE_COMPRESS (1U << 11)
#define BTRFS_INODE_ROOT_ITEM_INIT (1U << 31)
#endif
#ifndef BTRFS_FREE_SPACE_INFO_KEY

View File

@@ -208,6 +208,7 @@ namespace crucible {
ostream & operator<<(ostream &os, const BtrfsIoctlSearchKey &key);
string btrfs_chunk_type_ntoa(uint64_t type);
string btrfs_inode_flags_ntoa(uint64_t inode_flags);
string btrfs_search_type_ntoa(unsigned type);
string btrfs_search_objectid_ntoa(uint64_t objectid);
string btrfs_compress_type_ntoa(uint8_t type);

View File

@@ -6,23 +6,23 @@
#include <algorithm>
#include <limits>
#include <cstdint>
#if 0
// Debug stream
#include <memory>
#include <iostream>
#include <sstream>
#define DINIT(__x) __x
#define DLOG(__x) do { logs << __x << std::endl; } while (false)
#define DOUT(__err) do { __err << logs.str(); } while (false)
#else
#define DINIT(__x) do {} while (false)
#define DLOG(__x) do {} while (false)
#define DOUT(__x) do {} while (false)
#endif
#include <cstdint>
namespace crucible {
using namespace std;
extern thread_local shared_ptr<ostream> tl_seeker_debug_str;
#define SEEKER_DEBUG_LOG(__x) do { \
if (tl_seeker_debug_str) { \
(*tl_seeker_debug_str) << __x << "\n"; \
} \
} while (false)
// Requirements for Container<Pos> Fetch(Pos lower, Pos upper):
// - fetches objects in Pos order, starting from lower (must be >= lower)
// - must return upper if present, may or may not return objects after that
@@ -49,113 +49,108 @@ namespace crucible {
Pos
seek_backward(Pos const target_pos, Fetch fetch, Pos min_step = 1, size_t max_loops = numeric_limits<size_t>::max())
{
DINIT(ostringstream logs);
try {
static const Pos end_pos = numeric_limits<Pos>::max();
// TBH this probably won't work if begin_pos != 0, i.e. any signed type
static const Pos begin_pos = numeric_limits<Pos>::min();
// Run a binary search looking for the highest key below target_pos.
// Initial upper bound of the search is target_pos.
// Find initial lower bound by doubling the size of the range until a key below target_pos
// is found, or the lower bound reaches the beginning of the search space.
// If the lower bound search reaches the beginning of the search space without finding a key,
// return the beginning of the search space; otherwise, perform a binary search between
// the bounds now established.
Pos lower_bound = 0;
Pos upper_bound = target_pos;
bool found_low = false;
Pos probe_pos = target_pos;
// We need one loop for each bit of the search space to find the lower bound,
// one loop for each bit of the search space to find the upper bound,
// and one extra loop to confirm the boundary is correct.
for (size_t loop_count = min(numeric_limits<Pos>::digits * size_t(2) + 1, max_loops); loop_count; --loop_count) {
DLOG("fetch(probe_pos = " << probe_pos << ", target_pos = " << target_pos << ")");
auto result = fetch(probe_pos, target_pos);
const Pos low_pos = result.empty() ? end_pos : *result.begin();
const Pos high_pos = result.empty() ? end_pos : *result.rbegin();
DLOG(" = " << low_pos << ".." << high_pos);
// check for correct behavior of the fetch function
THROW_CHECK2(out_of_range, high_pos, probe_pos, probe_pos <= high_pos);
THROW_CHECK2(out_of_range, low_pos, probe_pos, probe_pos <= low_pos);
THROW_CHECK2(out_of_range, low_pos, high_pos, low_pos <= high_pos);
if (!found_low) {
// if target_pos == end_pos then we will find it in every empty result set,
// so in that case we force the lower bound to be lower than end_pos
if ((target_pos == end_pos) ? (low_pos < target_pos) : (low_pos <= target_pos)) {
// found a lower bound, set the low bound there and switch to binary search
found_low = true;
lower_bound = low_pos;
DLOG("found_low = true, lower_bound = " << lower_bound);
} else {
// still looking for lower bound
// if probe_pos was begin_pos then we can stop with no result
if (probe_pos == begin_pos) {
DLOG("return: probe_pos == begin_pos " << begin_pos);
return begin_pos;
}
// double the range size, or use the distance between objects found so far
THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
// already checked low_pos <= high_pos above
const Pos want_delta = max(upper_bound - probe_pos, min_step);
// avoid underflowing the beginning of the search space
const Pos have_delta = min(want_delta, probe_pos - begin_pos);
THROW_CHECK2(out_of_range, want_delta, have_delta, have_delta <= want_delta);
// move probe and try again
probe_pos = probe_pos - have_delta;
DLOG("probe_pos " << probe_pos << " = probe_pos - have_delta " << have_delta << " (want_delta " << want_delta << ")");
continue;
static const Pos end_pos = numeric_limits<Pos>::max();
// TBH this probably won't work if begin_pos != 0, i.e. any signed type
static const Pos begin_pos = numeric_limits<Pos>::min();
// Run a binary search looking for the highest key below target_pos.
// Initial upper bound of the search is target_pos.
// Find initial lower bound by doubling the size of the range until a key below target_pos
// is found, or the lower bound reaches the beginning of the search space.
// If the lower bound search reaches the beginning of the search space without finding a key,
// return the beginning of the search space; otherwise, perform a binary search between
// the bounds now established.
Pos lower_bound = 0;
Pos upper_bound = target_pos;
bool found_low = false;
Pos probe_pos = target_pos;
// We need one loop for each bit of the search space to find the lower bound,
// one loop for each bit of the search space to find the upper bound,
// and one extra loop to confirm the boundary is correct.
for (size_t loop_count = min((1 + numeric_limits<Pos>::digits) * size_t(2), max_loops); loop_count; --loop_count) {
SEEKER_DEBUG_LOG("fetch(probe_pos = " << probe_pos << ", target_pos = " << target_pos << ")");
auto result = fetch(probe_pos, target_pos);
const Pos low_pos = result.empty() ? end_pos : *result.begin();
const Pos high_pos = result.empty() ? end_pos : *result.rbegin();
SEEKER_DEBUG_LOG(" = " << low_pos << ".." << high_pos);
// check for correct behavior of the fetch function
THROW_CHECK2(out_of_range, high_pos, probe_pos, probe_pos <= high_pos);
THROW_CHECK2(out_of_range, low_pos, probe_pos, probe_pos <= low_pos);
THROW_CHECK2(out_of_range, low_pos, high_pos, low_pos <= high_pos);
if (!found_low) {
// if target_pos == end_pos then we will find it in every empty result set,
// so in that case we force the lower bound to be lower than end_pos
if ((target_pos == end_pos) ? (low_pos < target_pos) : (low_pos <= target_pos)) {
// found a lower bound, set the low bound there and switch to binary search
found_low = true;
lower_bound = low_pos;
SEEKER_DEBUG_LOG("found_low = true, lower_bound = " << lower_bound);
} else {
// still looking for lower bound
// if probe_pos was begin_pos then we can stop with no result
if (probe_pos == begin_pos) {
SEEKER_DEBUG_LOG("return: probe_pos == begin_pos " << begin_pos);
return begin_pos;
}
// double the range size, or use the distance between objects found so far
THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
// already checked low_pos <= high_pos above
const Pos want_delta = max(upper_bound - probe_pos, min_step);
// avoid underflowing the beginning of the search space
const Pos have_delta = min(want_delta, probe_pos - begin_pos);
THROW_CHECK2(out_of_range, want_delta, have_delta, have_delta <= want_delta);
// move probe and try again
probe_pos = probe_pos - have_delta;
SEEKER_DEBUG_LOG("probe_pos " << probe_pos << " = probe_pos - have_delta " << have_delta << " (want_delta " << want_delta << ")");
continue;
}
if (low_pos <= target_pos && target_pos <= high_pos) {
// have keys on either side of target_pos in result
// search from the high end until we find the highest key below target
for (auto i = result.rbegin(); i != result.rend(); ++i) {
// more correctness checking for fetch
THROW_CHECK2(out_of_range, *i, probe_pos, probe_pos <= *i);
if (*i <= target_pos) {
DLOG("return: *i " << *i << " <= target_pos " << target_pos);
return *i;
}
}
// if the list is empty then low_pos = high_pos = end_pos
// if target_pos = end_pos also, then we will execute the loop
// above but not find any matching entries.
THROW_CHECK0(runtime_error, result.empty());
}
if (target_pos <= low_pos) {
// results are all too high, so probe_pos..low_pos is too high
// lower the high bound to the probe pos
upper_bound = probe_pos;
DLOG("upper_bound = probe_pos " << probe_pos);
}
if (high_pos < target_pos) {
// results are all too low, so probe_pos..high_pos is too low
// raise the low bound to the high_pos
DLOG("lower_bound = high_pos " << high_pos);
lower_bound = high_pos;
}
// compute a new probe pos at the middle of the range and try again
// we can't have a zero-size range here because we would not have set found_low yet
THROW_CHECK2(out_of_range, lower_bound, upper_bound, lower_bound <= upper_bound);
const Pos delta = (upper_bound - lower_bound) / 2;
probe_pos = lower_bound + delta;
if (delta < 1) {
// nothing can exist in the range (lower_bound, upper_bound)
// and an object is known to exist at lower_bound
DLOG("return: probe_pos == lower_bound " << lower_bound);
return lower_bound;
}
THROW_CHECK2(out_of_range, lower_bound, probe_pos, lower_bound <= probe_pos);
THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
DLOG("loop: lower_bound " << lower_bound << ", probe_pos " << probe_pos << ", upper_bound " << upper_bound);
}
THROW_ERROR(runtime_error, "FIXME: should not reach this line: "
"lower_bound..upper_bound " << lower_bound << ".." << upper_bound << ", "
"found_low " << found_low);
} catch (...) {
DOUT(cerr);
throw;
if (low_pos <= target_pos && target_pos <= high_pos) {
// have keys on either side of target_pos in result
// search from the high end until we find the highest key below target
for (auto i = result.rbegin(); i != result.rend(); ++i) {
// more correctness checking for fetch
THROW_CHECK2(out_of_range, *i, probe_pos, probe_pos <= *i);
if (*i <= target_pos) {
SEEKER_DEBUG_LOG("return: *i " << *i << " <= target_pos " << target_pos);
return *i;
}
}
// if the list is empty then low_pos = high_pos = end_pos
// if target_pos = end_pos also, then we will execute the loop
// above but not find any matching entries.
THROW_CHECK0(runtime_error, result.empty());
}
if (target_pos <= low_pos) {
// results are all too high, so probe_pos..low_pos is too high
// lower the high bound to the probe pos, low_pos cannot be lower
SEEKER_DEBUG_LOG("upper_bound = probe_pos " << probe_pos);
upper_bound = probe_pos;
}
if (high_pos < target_pos) {
// results are all too low, so probe_pos..high_pos is too low
// raise the low bound to high_pos but not above upper_bound
const auto next_pos = min(high_pos, upper_bound);
SEEKER_DEBUG_LOG("lower_bound = next_pos " << next_pos);
lower_bound = next_pos;
}
// compute a new probe pos at the middle of the range and try again
// we can't have a zero-size range here because we would not have set found_low yet
THROW_CHECK2(out_of_range, lower_bound, upper_bound, lower_bound <= upper_bound);
const Pos delta = (upper_bound - lower_bound) / 2;
probe_pos = lower_bound + delta;
if (delta < 1) {
// nothing can exist in the range (lower_bound, upper_bound)
// and an object is known to exist at lower_bound
SEEKER_DEBUG_LOG("return: probe_pos == lower_bound " << lower_bound);
return lower_bound;
}
THROW_CHECK2(out_of_range, lower_bound, probe_pos, lower_bound <= probe_pos);
THROW_CHECK2(out_of_range, upper_bound, probe_pos, probe_pos <= upper_bound);
SEEKER_DEBUG_LOG("loop bottom: lower_bound " << lower_bound << ", probe_pos " << probe_pos << ", upper_bound " << upper_bound);
}
THROW_ERROR(runtime_error, "FIXME: should not reach this line: "
"lower_bound..upper_bound " << lower_bound << ".." << upper_bound << ", "
"found_low " << found_low);
}
}

View File

@@ -17,6 +17,7 @@ CRUCIBLE_OBJS = \
openat2.o \
path.o \
process.o \
seeker.o \
string.o \
table.o \
task.o \

View File

@@ -157,6 +157,13 @@ namespace crucible {
return btrfs_get_member(&btrfs_inode_item::size, m_data);
}
uint64_t
BtrfsTreeItem::inode_flags() const
{
THROW_CHECK1(invalid_argument, btrfs_search_type_ntoa(m_type), m_type == BTRFS_INODE_ITEM_KEY);
return btrfs_get_member(&btrfs_inode_item::flags, m_data);
}
uint64_t
BtrfsTreeItem::file_extent_logical_bytes() const
{
@@ -418,6 +425,7 @@ namespace crucible {
++loops;
fill_sk(sk, unscale_logical(min(scaled_max_logical(), lower_bound)));
set<uint64_t> rv;
bool too_far = false;
do {
sk.nr_items = 4;
sk.do_ioctl(fd());
@@ -426,6 +434,7 @@ namespace crucible {
next_sk(sk, i);
// If hdr_stop or !hdr_match, don't inspect the item
if (hdr_stop(i)) {
too_far = true;
rv.insert(numeric_limits<uint64_t>::max());
BTFRLB_DEBUG("(stop)");
break;
@@ -438,22 +447,23 @@ namespace crucible {
BTFRLB_DEBUG(" " << to_hex(this_logical) << " " << i);
const auto scaled_hdr_logical = scale_logical(this_logical);
BTFRLB_DEBUG(" " << "(match)");
if (scaled_hdr_logical > upper_bound) {
too_far = true;
BTFRLB_DEBUG("(" << to_hex(scaled_hdr_logical) << " >= " << to_hex(upper_bound) << ")");
break;
}
if (this_logical <= logical && this_logical > closest_logical) {
closest_logical = this_logical;
closest_item = i;
BTFRLB_DEBUG("(closest)");
}
rv.insert(scaled_hdr_logical);
if (scaled_hdr_logical > upper_bound) {
BTFRLB_DEBUG("(" << to_hex(scaled_hdr_logical) << " >= " << to_hex(upper_bound) << ")");
break;
}
BTFRLB_DEBUG("(cont'd)");
}
BTFRLB_DEBUG(endl);
// We might get a search result that contains only non-matching items.
// Keep looping until we find any matching item or we run out of tree.
} while (rv.empty() && !sk.m_result.empty());
} while (!too_far && rv.empty() && !sk.m_result.empty());
return rv;
}, scale_logical(lookbehind_size()));
return closest_item;

View File

@@ -987,6 +987,28 @@ namespace crucible {
return bits_ntoa(objectid, table);
}
string
btrfs_inode_flags_ntoa(uint64_t const inode_flags)
{
static const bits_ntoa_table table[] = {
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NODATASUM),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NODATACOW),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_READONLY),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NOCOMPRESS),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_PREALLOC),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_SYNC),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_IMMUTABLE),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_APPEND),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NODUMP),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NOATIME),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_DIRSYNC),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_COMPRESS),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_ROOT_ITEM_INIT),
NTOA_TABLE_ENTRY_END()
};
return bits_ntoa(inode_flags, table);
}
ostream &
operator<<(ostream &os, const btrfs_ioctl_search_key &key)
{

View File

@@ -4,9 +4,7 @@
// Compatibility for building on old libc for new kernel
#if LINUX_VERSION_CODE < KERNEL_VERSION(5, 6, 0)
// Every arch that defines this uses 437, except Alpha, where 437 is
// Every arch that defines this (so far) uses 437, except Alpha, where 437 is
// mq_getsetattr.
#ifndef SYS_openat2
@@ -17,8 +15,6 @@
#endif
#endif
#endif // Linux version >= v5.6
#include <fcntl.h>
#include <unistd.h>
@@ -29,12 +25,7 @@ __attribute__((weak))
openat2(int const dirfd, const char *const pathname, struct open_how *const how, size_t const size)
throw()
{
#ifdef SYS_openat2
return syscall(SYS_openat2, dirfd, pathname, how, size);
#else
errno = ENOSYS;
return -1;
#endif
}
};

7
lib/seeker.cc Normal file
View File

@@ -0,0 +1,7 @@
#include "crucible/seeker.h"
namespace crucible {
thread_local shared_ptr<ostream> tl_seeker_debug_str;
};

View File

@@ -754,7 +754,7 @@ namespace crucible {
m_prev_loadavg = getloadavg1();
if (target && !m_load_tracking_thread) {
m_load_tracking_thread = make_shared<thread>([=] () { loadavg_thread_fn(); });
m_load_tracking_thread = make_shared<thread>([this] () { loadavg_thread_fn(); });
m_load_tracking_thread->detach();
}
}
@@ -944,7 +944,7 @@ namespace crucible {
TaskConsumer::TaskConsumer(const shared_ptr<TaskMasterState> &tms) :
m_master(tms)
{
m_thread = make_shared<thread>([=](){ consumer_thread(); });
m_thread = make_shared<thread>([this](){ consumer_thread(); });
}
class BarrierState {

View File

@@ -5,7 +5,7 @@ After=sysinit.target
[Service]
Type=simple
ExecStart=@PREFIX@/sbin/beesd --no-timestamps %i
ExecStart=@PREFIX@/@BINDIR@/beesd --no-timestamps %i
CPUAccounting=true
CPUSchedulingPolicy=batch
CPUWeight=12

View File

@@ -387,7 +387,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
if (e.flags() & Extent::PREALLOC) {
// Prealloc is all zero and we replace it with a hole.
// No special handling is required here. Nuke it and move on.
BEESLOGINFO("prealloc extent " << e);
BEESLOGINFO("prealloc extent " << e << " in " << bfr);
// Must not extend past EOF
auto extent_size = min(e.end(), bfr.file_size()) - e.begin();
// Must hold tmpfile until dedupe is done
@@ -1126,15 +1126,15 @@ BeesContext::start()
m_progress_thread = make_shared<BeesThread>("progress_report");
m_progress_thread = make_shared<BeesThread>("progress_report");
m_status_thread = make_shared<BeesThread>("status_report");
m_progress_thread->exec([=]() {
m_progress_thread->exec([this]() {
show_progress();
});
m_status_thread->exec([=]() {
m_status_thread->exec([this]() {
dump_status();
});
// Set up temporary file pool
m_tmpfile_pool.generator([=]() -> shared_ptr<BeesTempFile> {
m_tmpfile_pool.generator([this]() -> shared_ptr<BeesTempFile> {
return make_shared<BeesTempFile>(shared_from_this());
});
m_logical_ino_pool.generator([]() {

View File

@@ -5,6 +5,7 @@
#include "crucible/cleanup.h"
#include "crucible/ntoa.h"
#include "crucible/openat2.h"
#include "crucible/seeker.h"
#include "crucible/string.h"
#include "crucible/table.h"
#include "crucible/task.h"
@@ -182,26 +183,41 @@ BeesScanModeSubvol::crawl_one_inode(const shared_ptr<BeesCrawl>& this_crawl)
}
const auto subvol = this_range.fid().root();
const auto inode = this_range.fid().ino();
ostringstream oss;
oss << "crawl_" << subvol << "_" << inode;
const auto task_title = oss.str();
const auto bfc = make_shared<BeesFileCrawl>((BeesFileCrawl) {
.m_ctx = m_ctx,
.m_crawl = this_crawl,
.m_roots = m_roots,
.m_hold = this_crawl->hold_state(this_state),
.m_state = this_state,
.m_offset = this_range.begin(),
});
BEESNOTE("Starting task " << this_range);
Task(task_title, [bfc]() {
BEESNOTE("crawl_one_inode " << bfc->m_hold->get());
if (bfc->scan_one_ref()) {
// Append the current task to itself to make
// sure we keep a worker processing this file
Task::current_task().append(Task::current_task());
bool run_the_task = false;
catch_all([&]() {
BtrfsInodeFetcher inode_btf(m_ctx->root_fd());
const auto inode_item = inode_btf.stat(subvol, inode);
if (!!inode_item) {
const auto flags = inode_item.inode_flags();
if (0 != (flags & BTRFS_INODE_NODATASUM)) {
BEESLOGDEBUG("unsupported inode flags for ref at root " << subvol << " ino " << inode << ": " << btrfs_inode_flags_ntoa(flags));
} else {
run_the_task = true;
}
}
}).run();
});
if (run_the_task) {
ostringstream oss;
oss << "crawl_" << subvol << "_" << inode;
const auto task_title = oss.str();
const auto bfc = make_shared<BeesFileCrawl>((BeesFileCrawl) {
.m_ctx = m_ctx,
.m_crawl = this_crawl,
.m_roots = m_roots,
.m_hold = this_crawl->hold_state(this_state),
.m_state = this_state,
.m_offset = this_range.begin(),
});
BEESNOTE("Starting task " << this_range);
Task(task_title, [bfc]() {
BEESNOTE("crawl_one_inode " << bfc->m_hold->get());
if (bfc->scan_one_ref()) {
// Append the current task to itself to make
// sure we keep a worker processing this file
Task::current_task().append(Task::current_task());
}
}).run();
}
auto next_state = this_state;
// Skip to EOF. Will repeat up to 16 times if there happens to be an extent at 16EB,
// which would be a neat trick given that off64_t is signed.
@@ -779,14 +795,27 @@ BeesScanModeExtent::SizeTier::create_extent_map(const uint64_t bytenr, const Pro
}
BtrfsExtentDataFetcher bedf(m_ctx->root_fd());
BtrfsInodeFetcher inode_btf(m_ctx->root_fd());
// Collect extent ref tasks as a series of stand-alone events
// chained after the first task created, then run the first one.
// This prevents other threads from starting to process an
// extent until we have all of its refs in the queue.
const auto refs_list = make_shared<list<ExtentRef>>();
bool found_nocow = false;
bool check_nocow = true;
for (const auto &i : log_ino.m_iors) {
catch_all([&](){
if (check_nocow) {
BEESTRACE("checking inode flags for extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum);
BEESNOTE("checking inode flags for extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum);
const auto inode_item = inode_btf.stat(i.m_root, i.m_inum);
if (!!inode_item) {
const auto flags = inode_item.inode_flags();
check_nocow = false;
if (0 != (flags & BTRFS_INODE_NODATASUM)) {
BEESLOGDEBUG("unsupported inode flags for extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum << ": " << btrfs_inode_flags_ntoa(flags));
found_nocow = true;
return; // from the catch_all
}
}
}
BEESTRACE("mapping extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum << " offset " << to_hex(i.m_offset));
BEESNOTE("mapping extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum << " offset " << to_hex(i.m_offset));
@@ -811,6 +840,11 @@ BeesScanModeExtent::SizeTier::create_extent_map(const uint64_t bytenr, const Pro
refs_list->push_back(extref);
BEESCOUNT(extent_ref_ok);
});
// Completely abandon the extent if it is nodatasum
if (found_nocow) {
BEESCOUNT(extent_nodatasum);
return;
}
}
BEESCOUNT(extent_mapped);
@@ -899,6 +933,9 @@ BeesScanModeExtent::scan()
{
BEESTRACE("bsm scan");
// Do nothing if we are throttled
if (should_throttle()) return;
unique_lock<mutex> lock(m_mutex);
const auto size_tiers_copy = m_size_tiers;
lock.unlock();
@@ -934,12 +971,14 @@ BeesScanModeExtent::SizeTier::find_next_extent()
// Low-level extent search debugging
shared_ptr<ostringstream> debug_oss;
const bool debug_oss_only_exceptions = true;
#if 0
// Enable a _lot_ of debugging output
debug_oss = make_shared<ostringstream>();
#endif
if (debug_oss) {
BtrfsIoctlSearchKey::s_debug_ostream = debug_oss;
tl_seeker_debug_str = debug_oss;
}
// Write out the stats no matter how we exit
@@ -967,10 +1006,13 @@ BeesScanModeExtent::SizeTier::find_next_extent()
);
}
if (debug_oss) {
BEESLOGDEBUG("debug oss trace:\n" << debug_oss->str());
if (!debug_oss_only_exceptions || exception_check()) {
BEESLOGDEBUG("debug oss trace:\n" << debug_oss->str());
}
}
}
BtrfsIoctlSearchKey::s_debug_ostream.reset();
tl_seeker_debug_str.reset();
});
#define MNE_DEBUG(x) do { \
@@ -1003,7 +1045,9 @@ BeesScanModeExtent::SizeTier::find_next_extent()
// There is a lot of debug output. Dump it if it gets too long
if (!debug_oss->str().empty()) {
if (crawl_time.age() > 1) {
BEESLOGDEBUG("debug oss trace (so far):\n" << debug_oss->str());
if (!debug_oss_only_exceptions) {
BEESLOGDEBUG("debug oss trace (so far):\n" << debug_oss->str());
}
debug_oss->str("");
}
}
@@ -1084,7 +1128,6 @@ BeesScanModeExtent::SizeTier::find_next_extent()
++size_low_count;
// Skip ahead over any below-min-size extents
BEESTRACE("min_size " << pretty(lower_size_bound) << " > scale_size " << pretty(m_fetcher.scale_size()));
const auto lsb_rounded = lower_size_bound & ~(m_fetcher.scale_size() - 1);
// Don't bother doing backward searches when skipping 128K or less.
// The search will cost more than reading 32 consecutive extent records.
@@ -1148,7 +1191,7 @@ BeesScanModeExtent::SizeTier::find_next_extent()
const auto hold_state = m_crawl->hold_state(this_state);
const auto sft = shared_from_this();
ostringstream oss;
oss << "map_" << to_hex(this_bytenr) << "_" << pretty(this_length);
oss << "map_" << hex << this_bytenr << dec << "_" << pretty(this_length);
Task create_map_task(oss.str(), [sft, this_bytenr, hold_state, this_length, find_next_task]() {
sft->create_extent_map(this_bytenr, hold_state, this_length, find_next_task);
BEESCOUNT(crawl_extent);
@@ -1299,8 +1342,8 @@ BeesScanModeExtent::next_transid()
const auto this_crawl = found->second->crawl();
THROW_CHECK1(runtime_error, subvol, this_crawl);
// Get the last _completed_ state
const auto this_state = this_crawl->get_state_begin();
// Get the last _queued_ state
const auto this_state = this_crawl->get_state_end();
auto bytenr = this_state.m_objectid;
const auto bg_found = bg_info_map.lower_bound(bytenr);
@@ -1323,23 +1366,25 @@ BeesScanModeExtent::next_transid()
}
const auto bytenr_offset = min(bi_last_bytenr, max(bytenr, bi.first_bytenr)) - bi.first_bytenr + bi.first_total;
const auto bytenr_norm = bytenr_offset / double(fs_size);
const auto time_so_far = now - min(now, this_state.m_started);
const auto eta_start = min(now, this_state.m_started);
const auto time_so_far = now - eta_start;
const string start_stamp = strf_localtime(this_state.m_started);
string eta_stamp = "-";
string eta_pretty = "-";
const auto &deferred_finished = deferred_map.at(subvol);
const bool finished = deferred_finished.second;
if (finished) {
// eta_stamp = "idle";
if (finished && m_roots->up_to_date(this_state)) {
eta_stamp = "idle";
} else if (time_so_far > 10 && bytenr_offset > 1024 * 1024 * 1024) {
const time_t eta_duration = time_so_far / bytenr_norm;
const time_t eta_time = eta_duration + now;
const time_t eta_time = eta_duration + eta_start;
const time_t eta_remain = eta_time - now;
eta_stamp = strf_localtime(eta_time);
eta_pretty = pretty_seconds(eta_duration);
eta_pretty = pretty_seconds(eta_remain);
}
const auto &mma = mes.m_map.at(subvol);
const auto mma_ratio = mes_sample_size_ok ? (mma.m_bytes / double(mes.m_total)) : 1.0;
const auto posn_text = Table::Text(finished ? "idle" : astringprintf("%06d", int(floor(bytenr_norm * 1000000))));
const auto posn_text = Table::Text(astringprintf("%06d", int(floor(bytenr_norm * 999999))));
const auto size_text = Table::Text( mes_sample_size_ok ? pretty(fs_size * mma_ratio) : "-");
eta.insert_row(Table::endpos, vector<Table::Content> {
Table::Text(magic.m_max_size == numeric_limits<uint64_t>::max() ? "max" : pretty(magic.m_max_size)),
@@ -2303,16 +2348,20 @@ BeesCrawl::BeesCrawl(shared_ptr<BeesContext> ctx, BeesCrawlState initial_state)
}
}
bool
BeesRoots::up_to_date(const BeesCrawlState &bcs)
{
// If we are already at transid_max then we are up to date
return bcs.m_max_transid >= transid_max();
}
bool
BeesCrawl::restart_crawl_unlocked()
{
const auto roots = m_ctx->roots();
const auto next_transid = roots->transid_max();
auto crawl_state = get_state_end();
// If we are already at transid_max then we are still finished
m_finished = crawl_state.m_max_transid >= next_transid;
m_finished = roots->up_to_date(crawl_state);
if (m_finished) {
m_deferred = true;
@@ -2323,7 +2372,7 @@ BeesCrawl::restart_crawl_unlocked()
// Start new crawl
crawl_state.m_min_transid = crawl_state.m_max_transid;
crawl_state.m_max_transid = next_transid;
crawl_state.m_max_transid = roots->transid_max();
crawl_state.m_objectid = 0;
crawl_state.m_offset = 0;
crawl_state.m_started = current_time;

View File

@@ -14,7 +14,7 @@ BeesThread::exec(function<void()> func)
{
m_timer.reset();
BEESLOGDEBUG("BeesThread exec " << m_name);
m_thread_ptr = make_shared<thread>([=]() {
m_thread_ptr = make_shared<thread>([this, func]() {
BeesNote::set_name(m_name);
BEESLOGDEBUG("Starting thread " << m_name);
BEESNOTE("thread function");

View File

@@ -8,21 +8,15 @@ thread_local BeesTracer *BeesTracer::tl_next_tracer = nullptr;
thread_local bool BeesTracer::tl_first = true;
thread_local bool BeesTracer::tl_silent = false;
bool
exception_check()
{
#if __cplusplus >= 201703
static
bool
exception_check()
{
return uncaught_exceptions();
}
#else
static
bool
exception_check()
{
return uncaught_exception();
}
#endif
}
BeesTracer::~BeesTracer()
{

View File

@@ -228,8 +228,10 @@ bees_readahead_check(int const fd, off_t const offset, size_t const size)
auto tup = make_tuple(offset, size, stat_rv.st_dev, stat_rv.st_ino);
static mutex s_recent_mutex;
static set<decltype(tup)> s_recent;
static Timer s_recent_timer;
unique_lock<mutex> lock(s_recent_mutex);
if (s_recent.size() > BEES_MAX_EXTENT_REF_COUNT) {
if (s_recent_timer.age() > 5.0) {
s_recent_timer.reset();
s_recent.clear();
BEESCOUNT(readahead_clear);
}
@@ -253,7 +255,7 @@ bees_readahead_nolock(int const fd, const off_t offset, const size_t size)
// The btrfs kernel code does readahead with lower ioprio
// and might discard the readahead request entirely.
BEESNOTE("emulating readahead " << name_fd(fd) << " offset " << to_hex(offset) << " len " << pretty(size));
auto working_size = size;
auto working_size = min(size, uint64_t(128 * 1024 * 1024));
auto working_offset = offset;
while (working_size) {
// don't care about multithreaded writes to this buffer--it is garbage anyway
@@ -423,7 +425,7 @@ bees_fsync(int const fd)
// can fill in the f_type field.
struct statfs stf = { 0 };
DIE_IF_NON_ZERO(fstatfs(fd, &stf));
if (stf.f_type != BTRFS_SUPER_MAGIC) {
if (static_cast<decltype(BTRFS_SUPER_MAGIC)>(stf.f_type) != BTRFS_SUPER_MAGIC) {
BEESLOGONCE("Using fsync on non-btrfs filesystem type " << to_hex(stf.f_type));
BEESNOTE("fsync non-btrfs " << name_fd(fd));
DIE_IF_NON_ZERO(fsync(fd));
@@ -502,6 +504,23 @@ BeesTempFile::resize(off_t offset)
// Count time spent here
BEESCOUNTADD(tmp_resize_ms, resize_timer.age() * 1000);
// Modify flags - every time
// - btrfs will keep trying to set FS_NOCOMP_FL behind us when compression heuristics identify
// the data as compressible, but it fails to compress
// - clear FS_NOCOW_FL because we can only dedupe between files with the same FS_NOCOW_FL state,
// and we don't open FS_NOCOW_FL files for dedupe.
BEESTRACE("Getting FS_COMPR_FL and FS_NOCOMP_FL on m_fd " << name_fd(m_fd));
int flags = ioctl_iflags_get(m_fd);
const auto orig_flags = flags;
flags |= FS_COMPR_FL;
flags &= ~(FS_NOCOMP_FL | FS_NOCOW_FL);
if (flags != orig_flags) {
BEESTRACE("Setting FS_COMPR_FL and clearing FS_NOCOMP_FL | FS_NOCOW_FL on m_fd " << name_fd(m_fd) << " flags " << to_hex(flags));
ioctl_iflags_set(m_fd, flags);
}
// That may have queued some delayed ref deletes, so throttle them
bees_throttle(resize_timer.age(), "tmpfile_resize");
}
@@ -543,13 +562,6 @@ BeesTempFile::BeesTempFile(shared_ptr<BeesContext> ctx) :
// Add this file to open_root_ino lookup table
m_roots->insert_tmpfile(m_fd);
// Set compression attribute
BEESTRACE("Getting FS_COMPR_FL on m_fd " << name_fd(m_fd));
int flags = ioctl_iflags_get(m_fd);
flags |= FS_COMPR_FL;
BEESTRACE("Setting FS_COMPR_FL on m_fd " << name_fd(m_fd) << " flags " << to_hex(flags));
ioctl_iflags_set(m_fd, flags);
// Count time spent here
BEESCOUNTADD(tmp_create_ms, create_timer.age() * 1000);
@@ -741,7 +753,7 @@ bees_main(int argc, char *argv[])
BEESLOGDEBUG("exception (ignored): " << s);
BEESCOUNT(exception_caught_silent);
} else {
BEESLOGNOTICE("\n\nTRACE: *** EXCEPTION ***\n\t" << s << "\n***\n");
BEESLOG(BEES_TRACE_LEVEL, "TRACE: EXCEPTION: " << s);
BEESCOUNT(exception_caught);
}
});

View File

@@ -588,8 +588,8 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
void current_state_set(const BeesCrawlState &bcs);
bool crawl_batch(shared_ptr<BeesCrawl> crawl);
void clear_caches();
shared_ptr<BeesCrawl> insert_root(const BeesCrawlState &bcs);
bool up_to_date(const BeesCrawlState &bcs);
friend class BeesCrawl;
friend class BeesFdCache;
@@ -901,5 +901,6 @@ void bees_readahead_pair(int fd, off_t offset, size_t size, int fd2, off_t offse
void bees_unreadahead(int fd, off_t offset, size_t size);
void bees_throttle(double time_used, const char *context);
string format_time(time_t t);
bool exception_check();
#endif

View File

@@ -19,7 +19,9 @@ seeker_finder(const vector<uint64_t> &vec, uint64_t lower, uint64_t upper)
if (ub != s.end()) ++ub;
if (ub != s.end()) ++ub;
for (; ub != s.end(); ++ub) {
if (*ub > upper) break;
if (*ub > upper) {
break;
}
}
return set<uint64_t>(lb, ub);
}
@@ -28,7 +30,7 @@ static bool test_fails = false;
static
void
seeker_test(const vector<uint64_t> &vec, uint64_t const target)
seeker_test(const vector<uint64_t> &vec, uint64_t const target, bool const always_out = false)
{
cerr << "Find " << target << " in {";
for (auto i : vec) {
@@ -36,11 +38,13 @@ seeker_test(const vector<uint64_t> &vec, uint64_t const target)
}
cerr << " } = ";
size_t loops = 0;
tl_seeker_debug_str = make_shared<ostringstream>();
bool local_test_fails = false;
bool excepted = catch_all([&]() {
auto found = seek_backward(target, [&](uint64_t lower, uint64_t upper) {
const auto found = seek_backward(target, [&](uint64_t lower, uint64_t upper) {
++loops;
return seeker_finder(vec, lower, upper);
});
}, uint64_t(32));
cerr << found;
uint64_t my_found = 0;
for (auto i : vec) {
@@ -52,13 +56,15 @@ seeker_test(const vector<uint64_t> &vec, uint64_t const target)
cerr << " (correct)";
} else {
cerr << " (INCORRECT - right answer is " << my_found << ")";
test_fails = true;
local_test_fails = true;
}
});
cerr << " (" << loops << " loops)" << endl;
if (excepted) {
test_fails = true;
if (excepted || local_test_fails || always_out) {
cerr << dynamic_pointer_cast<ostringstream>(tl_seeker_debug_str)->str();
}
test_fails = test_fails || local_test_fails;
tl_seeker_debug_str.reset();
}
static
@@ -89,6 +95,39 @@ test_seeker()
seeker_test(vector<uint64_t> { 0, numeric_limits<uint64_t>::max() }, numeric_limits<uint64_t>::max());
seeker_test(vector<uint64_t> { 0, numeric_limits<uint64_t>::max() }, numeric_limits<uint64_t>::max() - 1);
seeker_test(vector<uint64_t> { 0, numeric_limits<uint64_t>::max() - 1 }, numeric_limits<uint64_t>::max());
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 0);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 1);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 2);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 3);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 4);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 5);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 6);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 7);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 8);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, 9);
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 1 );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 2 );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 3 );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 4 );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 5 );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 6 );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 7 );
seeker_test(vector<uint64_t> { 0, 1, 2, 4, 8 }, numeric_limits<uint64_t>::max() - 8 );
// Pulled from a bees debug log
seeker_test(vector<uint64_t> {
6821962845,
6821962848,
6821963411,
6821963422,
6821963536,
6821963539,
6821963835, // <- appeared during the search, causing an exception
6821963841,
6822575316,
}, 6821971036, true);
}