1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

193 Commits

Author SHA1 Message Date
Zygo Blaxell
6dbef5f27b fs: improve compatibility with linux-libc-dev 5.4
Fix the missing symbols that popped up when adding chunk tree to
lib/fs.cc.  Also define the missing symbols instead of merely trying to
avoid them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-08 21:17:15 -05:00
Zygo Blaxell
d32f31f411 btrfs-tree: harden rlower_bound against exceptional objects
Rearrange the logic in `rlower_bound` so it can cope with a tree
that contains mostly block-aligned objects, with a few exceptions
filtered out by `hdr_stop`.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
dd08f6379f btrfs-tree: add a method to get root backref items to BtrfsRootFetcher
This complements the already existing support for reading the fields of
a root backref.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
58ee297cde btrfs-tree: connect methods to the debug stream interface
In some cases functions already had existing debug stream support
which can be redirected to the new interface.  In other cases, new
debug messages are added.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
a3c0ba0d69 fs: add a runtime debug stream for btrfs tree searches
This allows plugging in an ostream at run time so that we can audit all
the search calls we are doing.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
75040789c6 btrfs-tree: drop BtrfsFsTreeFetcher and clean up class comments
BtrfsFsTreeFetcher was used for early versions of the extent scanner, but
neither subvol nor extent scan now needs an object that is both persistent
and configured to access only one subvol.  BtrfsExtentDataFetcher does
the same thing in that case.

Clarify the comments on what the remaining classes do, so that
BtrfsFsTreeFetcher doesn't get inadvertently reinvented in the future.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
f9a697518d btrfs-tree: introduce BtrfsDataExtentTreeFetcher to read data extents without metadata
Binary searches can be extremely slow if the target bytenr is near a
metadata block group, because metadata items are not visible to the
binary search algorithm.  In a non-mixed-bg filesystem, there can be
hundreds of thousands of metadata items between data extent items, and
since the binary search algorithm can't see them, it will run searches
that iterate over hundreds of thousands of objects about a dozen times.

This is less of a problem for mixed-bg filesystems because the data and
metadata blocks are not isolated from each other.  The binary search
algorithm still can't see the metadata items, but there are usually
some data items close by to prevent the linear item filter from running
too long.

Introduce a new fetcher class (all the good names were taken) that tracks
where the end of the current block group is.  When the end of the current
block group is reached in the linear search, skip ahead to a block group
that can contain data items.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
c4ba6ec269 fs: add a ntoa function for chunk types
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
925b12823e fs: add do_ioctl_nothrow and fsid methods to btrfs fs info
Enable use of the ioctl to probe whether two fds refer to the same btrfs,
without throwing an exception.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
Zygo Blaxell
ad11db2ee1 openat2: supply the missing definitions for building with old headers and new kernel
Apparently Ubuntu 20 has upgraded to kernel 5.15, but still builds things
with 5.4 headers.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-19 22:20:06 -05:00
Zygo Blaxell
c53fa04a2f task: fixes for priority and idle Tasks
Tasks are not allowed to be queued more than once, but it is allowed
to queue a Task while it's already running, which means a Task can be
executed on two threads in parallel.  Tasks detect this and handle it
by queueing the Task on its own post-exec queue.  That in turn leads
to Workers which continually execute the same Task if that Task doesn't
create any new Tasks, while other Tasks sit on the Master queue waiting
for a Worker to dequeue them.

For idle Tasks, we don't want the Task to be rescheduled immediately.
We want the idle Task to execute again after every available Task on
both the main and idle queues has been executed.

Fix these by having each Task reschedule itself on the appropriate
queue when it finishes executing.

Priority queued Tasks should executed in priority order not just one
Task's post-exec queue, but the entire local queue of the TaskConsumer.

Fix this by moving the sort into either the TaskConsumer that receives
a post-exec queue, if there is one, or into the Task that is created
to insert the post-exec queue into a TaskConsumer when one becomes
available in the future.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-15 00:43:25 -05:00
Zygo Blaxell
a819d623f7 task: do not allow queue loops in priority queueing mode
Tasks using non-priority FIFO dependency tracking can insert themselves
into their own queue, to run the Task again immediately after it exits.

For priority queues, this attempts to splice the post-exec queue into
itself, which doesn't seem like a good idea.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-12 15:28:26 -05:00
Zygo Blaxell
de9d72da80 task: flatten queues of dependent Tasks
Suppose Task A, B, and C are created in that order, and currently running.
Task T acquires Exclusion E.  Task B, A, and C attempt to acquire the
same Exclusion, in that order, but fail because Task T holds it.

The result is Task T with a post-exec queue:

        T, [ B, A, C ]  sort_requested

Now suppose Task U acquires Exclusion F, then Task T attempts to acquire
Exclusion F.  Task T fails to acquire F, so T is inserted into U's
post-exec queue.  The result at the end of the execution of T is a tree:

        U, [ T ]  sort_requested
             \-> [ B, A, C ] sort_requested

Task T exits after failing to acquire a lock.  When T exits, T will
sort its post-exec queue and submit the post-exec queue for execution
immediately:

        Worker 1: U, [ T ]  sort_requested
        Worker 2: A, B, C

This isn't ideal because T, A, B, and C all depend on at least one
common Exclusion, so they are likely to immediately conflict with T
when U exits and T runs again.

Ideally, A, B, and C would at least remain in a common queue with T,
and ideally that queue is sorted.

Instead of inserting T into U's post-exec queue, insert T and all
of T's post-exec queue, which creates a single flattened Task list:

        U, [ T, B, A, C ]   sort_requested

Then when U exits, it will sort [ T, B, A, C ] into [ A, B, C, T ],
and run all of the queued Tasks in age priority order:

        U exited, [ T, B, A, C ]   sort_requested

        U exited, [ A, B, C, T ]

        [ A, B, C, T ] on TaskConsumer queue

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-12 14:05:44 -05:00
Zygo Blaxell
74d8bdd60f task: add an insert method for priority-queueing Tasks by age
Task started out as a self-organizing parallel-make algorithm, but ended
up becoming a half-broken wait-die algorithm.  When a contended object
is already locked, Tasks enter a FIFO queue to restart and acquire the
lock.  This is the "die" part of wait-die (all locks on an Exclusion are
non-blocking, so no Task ever does "wait").  The lock queue is FIFO wrt
_lock acquisition order_, not _Task age_ as required by the wait-die
algorithm.

Make it a 25%-broken wait-die algorithm by sorting the Tasks on lock
queues in order of Task ID, i.e. oldest-first, or FIFO wrt Task age.
This ensures the oldest Task waiting for an object is the one to get
it when it becomes available, as expected from the wait-die algorithm.

This should reduce the amount of time Tasks spend on the execution queue,
and reduce memory usage by avoiding the accumulation of Tasks that cannot
make forward progress.

Note that turning `TaskQueue` into an ordered container would have
undesirable side-effects:

 * `std::list` has some useful properties wrt stability of object
 location and cost of splicing.  Other containers may not have these,
 and `std::list` does have a `sort` method.

 * Some Task objects are created at the beginning and reused continually,
 but we really do want those Tasks to be executed in FIFO order wrt
 submission, not Task ID.  We can exclude these tasks by only doing the
 sorting when a Task is queued for an Exclusin object.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-12 00:35:37 -05:00
Zygo Blaxell
8bc90b743b task: get rid of the insert_task method
Nothing calls it (not even tests), and there's significant functional
overlap with `try_lock`.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-11 23:39:55 -05:00
Zygo Blaxell
82f1fd8054 process: replace crucible::gettid() with a weak symbol
Since we're now using weak symbols for dodgy libc functions, we might
as well do it for gettid() too.

Use the ::gettid() global namespace and let libc override it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-09 01:37:44 -05:00
Zygo Blaxell
a9b07d7684 openat2: create a weak syscall wrapper for it
openat2 allows closing more TOCTOU holes, but we can only use it when
the kernel supports it.

This should disappear seamlessly when libc implements the function.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-09 01:36:39 -05:00
Zygo Blaxell
a02588b16f time: add more methods to support dynamic rate throttling
* Allow RateLimiter to change rate after construction.
 * Check range of rate argument in constructor.
 * Atomic increment for RateEstimator.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
21cedfb13e bytevector: rename the argument to operator[] to be more descriptive
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
9beb602b16 task: ignore paused status while calculating dynamic thread count
bees might be unpaused at any time, so make sure that the dynamic load
calculation is ready with a non-zero thread count.

This avoids a delay of up to 5 seconds when responding to SIGUSR2
when loadavg tracking is enabled.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
1cbc894e6f task: start up more worker threads when unpausing
When paused, TaskConsumer threads will eventually notice the paused
condition and exit; however, there's nothing to restart threads when
exiting the paused state.

When unpausing, and while the lock is already held, create TaskConsumer
threads as needed to reach the target thread count.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 22:53:00 -05:00
Zygo Blaxell
d74862f1fc fs: set the correct nr_items to 0 in the ENOENT search case
Commit 72c3bf8438830b65cae7bdaff126053e562280e5 ("fs: handle ENOENT
within lib") was meant to prevent exceptions when a subvol is deleted.

If the search ioctl fails, the kernel won't set nr_items in the
ioctl output, which means `nr_items` still has the input value.  When
ENOENT is detected, `this->nr_items` is set to 0, then later `*this =
ioctl_ptr->key` overwrites `this->nr_items` with the original requested
number of items.

This replaced the ENOENT exception with an exception triggered by
interpreting garbage in the memory buffer.  The number of exceptions
was reduced because the memory buffers are frequently reused, but upper
layers would then reject the data or ignore it because it didn't match
the key range.

Fix by setting `ioctl_ptr->key.nr_items`, which then overwrites
`this->nr_items`, so the loop that extracts items from the ioctl data
gets the right number of items (i.e. zero).

Fixes: 72c3bf8438830b65cae7bdaff126053e562280e5 ("fs: handle ENOENT within lib")
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 22:48:15 -05:00
Zygo Blaxell
e99a505b3b bytevector: don't deadlock on operator<<
operator<< was a friend class that locked the ByteVector, then invoked
hexdump on the bytevector, which used ByteVector::operator[]...which
locked the ByteVector, resulting in a deadlock.

operator<< shouldn't be a friend class anyway.  Make hexdump use the
normal public access methods for ByteVector.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-03 23:39:33 -05:00
Zygo Blaxell
b99d80b40f task: add an idle queue
Add a second level queue which is only serviced when the local and global
queues are empty.

At some point there might be a need to implement a full priority queue,
but for now two classes are sufficient.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
099ad2ce7c fs: add some performance metrics for TREE_SEARCH_V2 calls
These give some visibility into how efficiently bees is using the
TREE_SEARCH_V2 ioctl.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
a59a02174f table: add a simple text table renderer
This should help clean up some of the uglier status outputs.

Supports:

 * multi-line table cells
 * character fills
 * sparse tables
 * insert, delete by row and column
 * vertical separators

and not much else.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
606ac01d56 multilock: allow turning it off
Add a master switch to turn off the entire MultiLock infrastructure for
testing, without having to remove and add all the individual entry points.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
72c3bf8438 fs: handle ENOENT within lib
This prevents the storms of exceptions that occur when a subvol is
deleted.  We simply treat the entire tree as if it was empty.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
72958a5e47 btrfs-tree: accessors for TreeFetcher classes' type and tree values
Sometimes we have a generic TreeFetcher and we need to know which tree
it came from.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
f25b4c81ba btrfs-tree: add root refs and extent flags fields
Lazily filling in accessor methods for btrfs objects as needed by bees.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
792fdbbb13 fs: get rid of 16 MiB limit on dedupe requests
The kernel has not required a 16 MiB limit on dedupe requests since
v4.18-rc1 b67287682688 ("Btrfs: dedupe_file_range ioctl: remove 16MiB
restriction").

Kernels before v4.18 would truncate the request and return the size
actually deduped in `bytes_deduped`.  Kernel v4.18 and later will loop
in the kernel until the entire request is satisfied (although still
in 16 MiB chunks, so larger extents will be split).

Modify the loop in userspace to measure the size the kernel actually
deduped, instead of assuming the kernel will only accept 16 MiB.
On current kernels this will always loop exactly once.

Since we now rely on `bytes_deduped`, make sure it has a sane value.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
3839690ba3 lib: fix btrfs_data_container pointer casts for 32-bit userspace on 64-bit kernels
Apparently reinterpret_cast<uint64_t> sign-extends 32-bit pointers.
This is OK when running on a 32-bit kernel that will truncate the pointer
to 32 bits, but when running on a 64-bit kernel, the extra bits are
interpreted as part of the (now very invalid) address.

Use <uintptr_t> instead, which is unsigned, integer, and the same word
size as the arch's pointer type.  Ordinary numeric conversion can take
it from there, filling the rest of the word with zeros.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-04-17 23:07:41 -04:00
Zygo Blaxell
75b2067cef btrfs-tree: fix build on clang++16
The "loops" variable isn't read (only set) if not built with extra
debug code.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-05-07 21:23:27 -04:00
Zygo Blaxell
7c764a73c8 fs: allow BtrfsIoctlLogicalInoArgs to be reused, remove virtual methods
Some malloc implementations will try to mmap() and munmap() large buffers
every time they are used, causing a severe loss of performance.

Nothing ever overrode the virtual methods, and there was no virtual
destructor, so they cause compiler warnings at build time when used with
a template that tries to delete pointers to them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-23 22:40:12 -05:00
Zygo Blaxell
3b85fc8bc7 lib: drop version.cc entirely
crucible::VERSION doesn't make much sense now that libcrucible no
longer exists as a shared library.  Nothing ever referenced it, so
it can go away.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-27 22:16:02 -05:00
Zygo Blaxell
4df1b2c834 lib: simplify dependency generation
We don't need to run all the dependencies first, Make can do those in parallel.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-27 22:16:02 -05:00
Zygo Blaxell
495218104a fd: FS_IOC_SETFLAGS takes an int* argument not a long*
According to ioctl_iflags(2):

	The type of the argument given to the FS_IOC_GETFLAGS and
	FS_IOC_SETFLAGS  operations is int *, notwithstanding the
	implication in the kernel source file include/uapi/linux/fs.h
	that the argument is long *.

So this code doesn't work on be64 machines.

Also, Valgrind complains about it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-27 22:16:02 -05:00
Zygo Blaxell
e82ce3c06e fd: pwrite returns ssize_t not int
A subtle distinction, and not one that is particularly relevant to bees,
but it does make toolchains complain.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-27 22:16:02 -05:00
Zygo Blaxell
bd336e81a6 fs: get rid of base class btrfs_ioctl_logical_ino_args
Another instance of the pattern where we derived a crucible class
from a btrfs struct.  Make it an automatic variable instead.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-27 22:16:02 -05:00
Zygo Blaxell
cb2c20ccc9 fs: get rid of base class btrfs_ioctl_same_extent_info
We only use BtrfsExtentInfo when it's exactly equivalent to the
base, so drop the derived class.

While we're here, fix BtrfsExtentSame::add so it uses a btrfs-compatible
uint64_t instead of an off_t.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-05 01:10:17 -05:00
Zygo Blaxell
ded5bf0148 btrfs-tree: fix whitespace and const
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-05 01:10:17 -05:00
Zygo Blaxell
d5de012a17 btrfs-tree: translate item types for error messages
Look up the name when filling in the what() field for the exception.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-05 01:10:17 -05:00
Zygo Blaxell
66d1e8a89b btrfs-tree: add chunk items: length and type
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-05 01:10:17 -05:00
Zygo Blaxell
563e584da4 task: use pthread_setname_np correctly
It turns out I've been using pthread_setname_np wrong the whole time:

 * on Linux, the thread name length is 15 characters.
   TASK_COMM_LEN is 16 bytes, and the last one is always 0.
   This is now hardcoded in many places and cannot be changed.

 * pthread_setname_np doesn't return -errno, so DIE_IF_MINUS_ERRNO
   was the wrong macro.  On the other hand, we never want to do anything
   differently when pthread_setname_np fails, so we never needed to
   check the return value.

Also, libc silently ignores attempts to set the thread name when it is too
long.  That's almost certainly a libc bug, but libc probably suppresses
the error result for the same reasons I ignore the error result.

Wrap the pthread_setname function with a C++ std::string overload that
truncates the argument at 15 characters, so we at least get the first
part of the task name in the thread name field.  Later commits can deal
with making the bees thread names shorter.

Also wrap pthread_getname for symmetry.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-05 01:10:17 -05:00
Zygo Blaxell
4d59939b07 btrfs-tree: introduce lightweight classes for btrfs tree search operations
btrfs-tree provides classes for low-level access to btrfs tree objects.

An item class is provided to decode polymorphic btrfs item fields.

Several tree classes provide forward and backward iteration over raw
object items at different tree levels.

A csum tree class provides convenient access to csums by bytenr,
supporting all current btrfs csum types.

Wrapper classes for inode and subvol items provide direct access to
btrfs metadata fields without clumsy stat() wrappers or ioctls.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:59 -05:00
Zygo Blaxell
148cc03060 bytevector: do not deadlock in self-assignment
Not that this is a particularly useful use case, but it will lock up,
and it should not.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:58 -05:00
Zygo Blaxell
b699325a77 bytevector: don't need _all_ of those mutexes
Methods that don't even look at the pointer don't need a mutex.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:58 -05:00
Zygo Blaxell
a59d89ea81 bytevector: add some fugly mutexes
We are using ByteVectors from multiple threads in some cases.  Mostly
these are the status and progress threads which read the ByteVector
object references embedded in BEESNOTE macros.

Since it's not clear what the data race implications are, protect
the shared_ptr in ByteVector with a mutex for now.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:58 -05:00
Zygo Blaxell
d1015b683f bytevector: add ostream output with hexdump
There is a hexdump template in fs.  Move hexdump to its own header,
then ByteVector can use it too.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:58 -05:00
Zygo Blaxell
b143664747 task: use exponential backoff algorithm to set thread count
Tasks are often running longer than 5 seconds (especially extents with
multiple references requiring copy operations), so the load tracking
algorithm needs to average several samples over a longer period of time
than 5 seconds.  If the sample period is 60 seconds, we end up recomputing
the original load average from current_load, so skip the rounding error
and use the original load average value.

Arguably the real fix is to break up the more complex extent operations
over several downstream Task objects, but that's a more significant
design change.

Tweak the attack and decay rates so that threads are started a little
more slowly, but still stopped rapidly when load spikes up.

Remove the hysteresis to provide support for load average targets
below 1, or with fractional components, with a PWM-like effect.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00