1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

493 Commits

Author SHA1 Message Date
Zygo Blaxell
2f14a5a9c7 roots: reduce number of objects per TREE_SEARCH_V2, drop BEES_MAX_CRAWL_ITEMS and BEES_MAX_CRAWL_BYTES
This makes better use of dynamic buffer sizing, and reduces the amount
of stale date lying around.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-31 19:42:01 -04:00
Zygo Blaxell
cf4091b352 endian: fix uint16_t specialization of le_to_cpu
Fortunately, we have not had cause to read any 16-bit fields out of
btrfs structures yet.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-31 19:42:01 -04:00
Zygo Blaxell
587870911f roots: use const more
Mark local variables that can be const const.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-31 19:42:01 -04:00
Zygo Blaxell
d384f3eec0 roots: ignore subvol when it is read-only and send workaround is enabled
Previously, when the bees send workaround is enabled, bees would
immediately advance the subvol's crawl status as if the entire subvol
had been scanned.

If the subvol is later made read-write, or if the workaround is disabled,
bees sees that the subvol has already been marked as scanned.  This is
an unfortunate result if the subvol is inadvertently marked read-only
or if bees is inadvertently run with the send workaround disabled.

Instead, (almost) completely ignore the subvol:  don't advance the crawl
pointer, don't consider the subvol in the list if searchable roots, and
don't consider the subvol when calculating min_transid for new subvols.

The "almost" part is:  if the subvol scan has not yet started, keep its
start timestamp current so it won't mess up subvol traversal performance
metrics.

Also handle exceptions while determining whether a subvol is read-only,
as those apparently do happen.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-31 19:42:01 -04:00
gin66
596f2c7dbf
Remove duplicated //etc for make install
install -Dm644 scripts/beesd.conf.sample $(DESTDIR)/$(ETC_PREFIX)/bees/beesd.conf.sample
will  expand to //etc/bees/beesd.conf.sample. This patch removes the duplicated /
2021-10-31 10:41:56 +01:00
Zygo Blaxell
84adbaecf9 beesd: add missing RuntimeDirectory
Since we started locking down the beesd service, we no longer have
privileges to do some things.  Have systemd do it for us instead.

Fixes: #195
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-14 21:13:33 -04:00
Zygo Blaxell
12e80658a8 fs: fix FIEMAP_MAX_OFFSET type silliness in fiemap.h
In fiemap.h the members of struct fiemap are declared as __u64, but the
FIEMAP_MAX_OFFSET macro is an unsigned long long value:

	$ grep FIEMAP_MAX_OFFSET -r /usr/include/
	/usr/include/linux/fiemap.h:#define FIEMAP_MAX_OFFSET   (~0ULL)
	$ grep fe_length -r /usr/include/
	/usr/include/linux/fiemap.h:    __u64 fe_length;   /* length in bytes for this extent */

This results in a type mismatch error on architectures like ppc64le:

	fiemap.cc:31:35: note:   deduced conflicting types for parameter 'const _Tp' ('long unsigned int' and 'long long unsigned int')
	    31 |                 fm.fm_length = min(fm.fm_length, FIEMAP_MAX_OFFSET - fm.fm_start);
	       |                                ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Work around this by copying the macro into a uint64_t constant,
and not using the macro any more.

Fixes: https://github.com/Zygo/bees/issues/194

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-06 15:17:02 -04:00
Zygo Blaxell
b436f8483b docs: add readahead_ event group
readahead and unreadahead have new event counters.  Document them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
v0.7
2021-10-04 20:44:25 -04:00
Zygo Blaxell
a353d8cc6e hash: use POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED
The hash table is one of the few cases in bees where a non-trivial amount
of page cache memory will be used in a predictable way, so we can advise
the kernel about our IO demands in advance.

Use WILLNEED to prefetch hash table pages at startup.

Use DONTNEED to trigger writeback on hash table pages at shutdown.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-04 20:41:09 -04:00
Zygo Blaxell
97d70ef4c5 bees: readahead() in the kernel is posix_fadvise(..., POSIX_FADV_WILLNEED)
In theory, we don't need the pread() loop, because the kernel will do a
better job with readahead().

In practice, we might still need the pread() code, as the readahead will
occur at idle IO priority, which could adversely affect bees performance.

More testing is required.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-04 20:21:01 -04:00
Zygo Blaxell
a9cd19a5fe fs: avoid unaligned access when copying btrfs search headers
The assignment operator will use member-wise assignment, which
assumes the object's this pointer is aligned.  That doesn't
happen when the object in question is part of a btrfs search
result, and aarch64 faults over it.

Use memcpy instead, which has no alignment constraints.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-10-04 20:19:00 -04:00
Jiahao XU
69c3d99552 Rm MOUNT_OPTIONS for it is of no use and dangerous
Btrfs mount options effects all mount points using the same Btrfs
partition, so specifing it per-mount is useless.

Also, common mount options like `noatime,nosuid,nodev,noexec` has little
to no effect on beesd, so it's just better and simpler to remove this.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
2021-10-04 20:19:00 -04:00
Jiahao XU
ccec63104c Update default MOUNT_OPTIONS beesd.in
`noatime` to avoid updating atime;
`nodev,noexec,nosuid` for the pedantic.
2021-10-04 20:19:00 -04:00
Jiahao XU
951b5ce360 Fix typo when setting default val of MOUNT_OPTIONS in beesd.in
Fixed mistake in #188
2021-10-04 20:18:55 -04:00
Jiahao XU
f2c65f2f4b
Update comment in beesd@.service.in
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
2021-09-04 21:20:05 +10:00
Jiahao XU
c79eb1d704
Further sandbox beesd using systemd.exec options
I've verified that using this setup, user will be able to access the log
in /run/bees, but cannot access the mounted filesystem.

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
2021-09-04 17:40:13 +10:00
Zygo Blaxell
522e52618e context: calculate TOTAL RATES correctly
The denominator for TOTAL RATES is the total running time, not the delta
running time.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-08-30 18:23:42 -04:00
Jiahao XU
4a3d3e7a43 Modify systemd unit and beesd.in to use private mnt namespace
to:
 - avoid influencing the global mount namespace
 - auto umount upon exit of this unit

Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
2021-08-30 18:23:38 -04:00
Jiahao XU
13abf8aada Add new options MOUNT_OPTIONS
Signed-off-by: Jiahao XU <Jiahao_XU@outlook.com>
[trailing whitespace deleted]
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-08-30 18:22:30 -04:00
Kai Krakow
081a6af278
bees: Avoid unused result with -Werror=unused-result
Fixes: commit 20b8f8ae0b392 ("bees: use helper function for readahead")
Signed-off-by: Kai Krakow <kai@kaishome.de>
2021-06-19 10:35:28 +02:00
Zygo Blaxell
3d95460eb7 fiemap: don't force flush so we can see the delalloc shenanigans
Like filefrag, fiemap was defaulting to FIEMAP_FLAG_SYNC, and providing no
option to turn it off.  This prevents observation of delayed allocations,
making fiemap less useful.

Override the default flag setting so fiemap gets the current
(i.e. unflushed) extent map state.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 21:09:14 -04:00
Zygo Blaxell
d9e3c0070b context: stop creating new refs when there are too many already
LOGICAL_INO_V2 has a maximum limit of 655050 references per extent.
Although it no longer has a crippling performance problem, at roughly
two seconds to process extent, it's too slow to be useful.

When an extent gains an absurd number of references, stop making any
more.  Returning zero extent refs will make bees believe the extent
was deleted, and it will remove the block from the hash table.

This helps speed processing of highly duplicated large files like
VM images, and the cost of a slightly lower dedupe hit rate.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 21:05:55 -04:00
Zygo Blaxell
955b8ae459 task: set the name of consumer threads so it is not "load_tracker"
The default name of a newly constructed thread is apparently the name
of the thread that created it.  That's very misleading when there are
a lot of TaskConsumer threads and they have nothing to do, so set the
name of each TaskConsumer thread as soon as it is created.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 21:02:00 -04:00
Zygo Blaxell
08899052ad trace: current_exception() is not a replacement for uncaught_exception()
In 15ab981d9e "bees: replace uncaught_exception(), deprecated in C++17",
uncaught_exception() was replaced with current_exception(); however,
current_exception() is only valid after an exception has been captured
by a catch block.

BeesTracer wants to know about exceptions _before_ they are caught,
so current_exception() is not useful here.

Instead, conditionally compile using uncaught_exception() or
uncaught_exceptions(), selected by C++ standard version, and make
bees stack traces work again.

Fixes: 15ab981d9e "bees: replace uncaught_exception(), deprecated in C++17"
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
03532effed trace: move BeesTrace and BeesNote into their own translation unit
This allows these components to be used by test executables without
pulling in all of bees, and more rapidly iterate their code.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
6adaedeecd extentwalker: fix the binary search and add some debug infrastructure
Add some conditionally-compiled debug code, including an in-memory log
of what ExtentWalker does.  Dump that log on exceptions.

If we loop too many times in a debug build, kill the process so we can
stack trace.  In non-debug builds just throw a normal exception.

Grow the step size instead of shrinking it, to reduce the number of
binary search iterations.

Prevent a bug where the step size bottoms out before positioning the
target extent in the middle of the result vector.

Use the first extent for "first_extent", instead of the 3rd.

Get rid of some redundant checks.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
54f03a0297 extentwalker: fix missing characters
"C" in LOGICAL_INO, and avoid writing "flags=" in the log.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
52279656cf extentwalker: fix the hole position logic
When a file ends with a hole, ExtentWalker synthesizes a hole extent record
to cover the distance between the last ipos and EOF.  Unfortunately, ipos
was incremented by the number of items in the result vector instead.  Fix
that by incrementing by hole_extent.size().

While we're here, fix up some of the other data quality logic, including
a useless THROW_CHECK that was nothing but workarounds for earlier bugs.

Fixes: https://github.com/Zygo/bees/issues/26
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
1fd26a03b2 tracer: annotate both ends of the stack trace
Add a matching "--- BEGIN TRACE..." line to complement the "---  END
TRACE..." line.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
b083003cf7 docs: update kernel bugs table as of 5.12.3
Two new tree mod log bugs #5 and #6 (uncovered by the zoned IO work,
though #6 has been seen in the wild on 5.10.29).

Tweak the next of some of the workarounds.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
b2d4a07c6f roots: add a TRACE for transid_max search and crawl_transid thread
Some users are hitting an exception somewhere in crawl_transid, which
forces bees to return back to the transid_max calculation over and over.
Also out-of-range transids.

Add some BEESTRACE so we can see what we were doing in the exception
handler.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
7008c74113 bees: trace and log improvements during roots and context startup
Currently if crawl throws an exception, we don't have basic information
about what was being crawled or even if the crawler was running at all.

These traces also help identify the causes of early exception failures.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
5f0f7a8319 bees: increase StringFile size limit
If we are going to dedupe thousands of subvols, we are going to need a
bigger beescrawl.dat.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
ee86b585a5 bees: use a reserved symbol name in BEESLOG
"c" could be a local variable name, which would do interesting things
to some log messages.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
cf4b5417c9 context: remove unnecessary copies
These were added while debugging a crash that was fixed years ago.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
77ef6a0638 roots: split constructor into separate start method
This allows us to use the fd cache and inode resolve functions
without starting crawler threads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
0f0da21198 context: track record extent reference counts
This might be interesting information, though most of the motivation for
this evaporated when kernel 5.7 came out.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
8a70bca011 bees: misc comment updates
These have been accumulating in unpublished bees commits.  Squash them all
into one.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
20b8f8ae0b bees: use helper function for readahead
There seem to be multiple ways to do readahead in Linux, and only some
of them work.  Hopefully reading the actual data is one of them.

This is an attempt to avoid page-by-page reads in the generic dedupe code.
We load both extents into the VFS cache (read sequentially) and hope they
are still there by the time we call dedupe on them.

We also call readahead(2) and hopefully that either helps or does nothing.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
0afd2850f4 cache: emit log messages when clearing FD cache
This enables us to correlate FD cache clears with external events such
as btrfs inode eviction storms.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:46 -04:00
Zygo Blaxell
ffac407a9b roots: clean up crawl_master
Remove some broken #if 0 code, and take advantage of new Task
non-repeating execution semantics.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
4f032ab85b context: report Task instance count
Report the number of Task objects that currently exist as well as the number
on the global work queue.

	THREADS (work queue 298 of 2385 tasks, 16 workers):

This helps spot leaks, since Task objects that are blocked on other Task
post-exec queues are otherwise invisible.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
5f763f6d41 task: handle thread lifecycle more strictly
Testing sometimes crashes during exec of the first Task object, which
triggers construction of TaskConsumer threads.  Manage the life cycle
of the thread more strictly--don't access any methods of TaskConsumer
or std::thread until the constructor's caller's lock on TaskMaster
is released.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
0928362aab task: replace waiting state with run/exec counter
Task::run() would schedule a new execution of Task, unless it was waiting
on a queue for execution.  This cannot be implemented with a bool,
since a Task might be included in multiple queues, and should still be
in waiting state even when executed in that case.

Replace the bool with a counter.  run() and append() (but not
append_nolock) increment the counter, exec() decrements the counter.
If the counter is non-zero when run() or append() is called, the Task
is not scheduled.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
d5ff35eacf task: track number of Task objects in program and provide report
This is a simple lightweight counter that tracks the number of Task
objects that exist.  Useful for leak detection.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
b7f9ce3f08 task: serialize Task execution when Tasks block due to mutex contention
Quite often we want to execute task B after task A finishes executing,
especially if tasks A and B attempt to acquire locks on the same objects.

Implement that capability in Task directly:  each Task holds a queue
of Tasks which will be executed strictly after this Task has finished
executing, or if the Task is destroyed.

Add a local queue to each TaskConsumer.  This queue contains a list
of Tasks which are to be executed by a single thread in sequential
order.  These tasks are executed before fetching any tasks from
TaskMaster.

Each time a Task finishes executing, the list of tasks appended to the
recently executed Task are spliced at the beginning of the thread's
TaskConsumer local queue.  These tasks will be executed in the same
thread in the same order they were appended to the recently executed Task.

If a Task is destroyed with a post-execution queue, that queue is
also inserted at the front of the current TaskConsumer's local queue.

If a Task is destroyed or somehow executed outside of a TaskConsumer
thread, or a TaskConsumer thread is destroyed, the local queue of Tasks
is wrapped in a "rescue_task" Task, and spliced before the head of the
global queue.  This preserves the sequential ordering of tasks.

In all cases the order of sequential execution of Tasks that are
appended to another Task is preserved.

The unused queue insertion functions are removed.

Exclusion is now simply a mutex, a bool, and a Task with an empty
function.  Tasks that queue up waiting for the mutex are stored in
Exclusion's Task, and Exclusion simply runs that task when the
ExclusionState is released.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
592580369e docs: btrfs-kernel: add the extent ref hash bug
Fixed in 5.11 and 5.10 but _not_ 5.10 or 5.4 (yet).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
0bbaddd54c docs: finally concede that the consensus spelling is "dedupe"
Change documentation and comments to use the word "dedupe," not "dedup"
as found in circa-3.15 kernel sources.

No changes in code or program output--if they used "dedup" before, they
will continue to be spelled "dedup" now.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
06a46e2736 chatter: add option to remove log level prefix
Some projects use only one log level, so there is no need to repeat it
for every line.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
45afce72e3 test: fd: note when bad cast exception is expected
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00