The hash table is one of the few cases in bees where a non-trivial amount
of page cache memory will be used in a predictable way, so we can advise
the kernel about our IO demands in advance.
Use WILLNEED to prefetch hash table pages at startup.
Use DONTNEED to trigger writeback on hash table pages at shutdown.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In theory, we don't need the pread() loop, because the kernel will do a
better job with readahead().
In practice, we might still need the pread() code, as the readahead will
occur at idle IO priority, which could adversely affect bees performance.
More testing is required.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This allows these components to be used by test executables without
pulling in all of bees, and more rapidly iterate their code.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Currently if crawl throws an exception, we don't have basic information
about what was being crawled or even if the crawler was running at all.
These traces also help identify the causes of early exception failures.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
There seem to be multiple ways to do readahead in Linux, and only some
of them work. Hopefully reading the actual data is one of them.
This is an attempt to avoid page-by-page reads in the generic dedupe code.
We load both extents into the VFS cache (read sequentially) and hope they
are still there by the time we call dedupe on them.
We also call readahead(2) and hopefully that either helps or does nothing.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Change documentation and comments to use the word "dedupe," not "dedup"
as found in circa-3.15 kernel sources.
No changes in code or program output--if they used "dedup" before, they
will continue to be spelled "dedup" now.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Higher CPU core counts became more common, and kernel bugs became less
common, since the arbitrary 8-thread limit was introduced. We can remove
the limit now, and treat any remaining scaling inefficiency as a bug to
be removed.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Apparently it is missing in newer Linux headers, making
builds fail. We don't need it, so remove it.
Closes: #160
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Now that tempfiles are using pool checkin functions to control their
size, we don't need a size limit in realign().
We keep the limit in make_copy because it's a sanity check against
letting a multi-terabyte copy operation slip through.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Get rid of the thread-local TempFiles and use Pool instead. This
eliminates a potential FD leak when the loadavg governor repeatedly
creates and destroys threads.
With the old per-thread TempFiles, we were guaranteed to have exclusive
ownership of the TempFile object within the current thread. Pool is
somewhat stricter: it only guarantees ownership while the checked-out
Handle exists. Adjust the users of TempFile objects to ensure they hold
the Handle object until they are finished using the TempFile.
It appears that maintaining large, heavily-reflinked, long-lived temporary
files costs more than truncating after every use: btrfs has to write
multiple references to the temporary file's extents, then some commits
later, remove references as the temporary file is deleted or truncated.
Using the temporary file in a dedupe operation flushes the data to disk,
so nothing is saved by pretending that there is writeback pipelining and
trying to avoid flushes in truncate. Pool provides usage tracking and
a checkin callback, so use it to truncate the temporary file immediately
after every use.
Redesign TempFile so that every instance creates exactly one Fd which
persists over the lifetime of the TempFile object. Provide a reset()
method which resets the file back to the initial state and call it from
the Pool checkin callback. This makes TempFile's lifetime equivalent to
its Fd's lifetime, which simplifies interactions with FdCache and Roots.
This change means we can now blacklist temporary files without having
an effective memory leak, so do that. We also have a reason to ever
remove something from the blacklist, so add a method for that too.
In order to move to extent-centric addressing, we need to be able to
reliably open temporary files by root and inode number. Previously we
would place TempFile fd's into the cache with insert_root_ino, but the
cache would be cleared periodically, and it would not be possible to
reopen temporary files after that happened. Now that the TempFile's
lifetime is the same as the TempFile Fd's lifetime, we can have TempFile
manage a separate FileId -> Fd map in Roots which is unaffected by the
periodic cache clearing. BeesRoots::open_root_ino_nocache will check
this map before attempting to open the file via btrfs root+ino lookup,
and return it through the cache as if Roots had opened the file via btrfs.
Hold a reference to BeesRoots in BeesTempFile because the usual way
to get such a reference now throws an exception in BeesTempFile's
destructor.
These changes make method BeesTempFile::create() and all methods named
insert_root_ino unnecessary, so delete them.
We construct and destroy TempFiles much less often now, so make their
constructor and destructor more informative.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
I was never able to prove a connection between fsync() and deadlock bugs.
There were too many deadlock bugs to be able to isolate a bug that is
triggered specifically by fsync.
Update the comment (which has been unchanged since kernel 4.14). We still
may want to do fsync() on temporary files someday, but there's a full
internal API rewrite between here and there.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
It's a pain to read, edit, and format large blocks of text in C++ code,
so rip the usage message out of bees.cc and put it in a plain text file.
Use a minimal translator to convert it into a C string.
While we're here, remove the multiple roots feature from the command
line synopsis, as we don't really support it any more. Also clarify
that "id 5" is "subvol id 5", and describe in one sentence what
workaround-btrfs-send does.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
uncaught_exception() had only the one valid use case, and it can be
reimplemented by literally calling current_exception() instead.
current_exception() has several valid use cases, so it is not likely
to be deprecated any time soon.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We cannot use BeesContext::roots() until after
BeesContext::set_root_path() has been called.
Save up the parameter settings until then.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In version 2.30 glibc added it's own gettid() function. This resulted in
"error: call of overloaded ‘gettid()’ is ambiguous" because gettid()
now exists in both namespace crucible and std.
For now, use explicit references to namespace crucible. This continues
to work with new and old libc without having to test specific library
versions.
At some point, glibc gettid() will be deployed widely enough that we can
remove the crucible version entirely.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Some build environments (ARM? AARCH64?) do not have the fields
si_lower and si_upper in siginfo.
bees doesn't need them, so don't try to access them.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Introduce a mechanism to suppress exceptions which do not produce a
full stack trace for common known cases where a loop should be aborted.
Use this mechanism to suppress the infamous "FIXME" exception.
Reduce the log level to at most NOTICE, and in some cases DEBUG.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Capture SIGINT and SIGTERM and shut down, preserving current completed
crawl and hash table state.
* Executing tasks are completed, queued tasks are paused.
* Crawl state is saved.
* The crawl master and crawl writeback threads are terminated.
* The task queue is flushed.
* Dirty hash table extents are flushed.
* Hash prefetch and writeback threads are terminated.
* Hash table is deallocated.
* FD caches and tmpfiles are destroyed.
* Assuming the above didn't crash or deadlock, bees exits.
The above order isn't the fastest, but it does roughly follow the
shared_ptr dependencies and avoids data races--especially those that
might lead to bees reporting an extent scanned when it was only queued
for future scanning that did not occur.
In case of a violation of expected shared_ptr dependency order,
exceptions in BeesContext child object accessor methods (i.e. roots(),
hash_table(), etc) prevent any further progress in threads that somehow
remain unexpectedly active.
Move some threads from main into BeesContext so they can be stopped
via BeesContext. The main thread now runs a loop waiting for signals.
A slow FD leak was discovered in TempFile handling. This has not been
fixed yet, but an implementation detail of the C++ runtime library makes
the leak so slow it may never be important enough to fix.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The deadlock seems to be fixed now (if there ever was one--there certainly
were deadlocks, but matching deadlocks to root causes is non-trivial
and a number of distinct deadlock cases have been fixed in recent years).
The benchmark data is inconclusive about whether it is better to fsync or
not to fsync. A paranoia option might be useful here.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
After weeks of testing I copied part of a change to main without copying
the rest of the change, leading to an immediate segfault on startup.
So here is the rest of the change: limit the number of
BeesContexts per process to 1. This change was discussed at
https://github.com/Zygo/bees/issues/54#issuecomment-360332529 but there
are more reasons to do it now: the candidates to replace the current
hash table format are less forgiving of sharing hash tables, and it may
even become necessary to have more than one hash table per BeesContext
instance (e.g. to keep datasum and nodatasum data separate).
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
https://github.com/Zygo/bees/issues/91 describes problems encountered
when running bees on systems with many CPU cores.
Limit the computed number of threads (using --thread-factor or the
default) to a maximum of 8 (i.e. the number of logical cores in a modern
laptop). Users can override the limit by using --thread-count.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Introduce --workaround options which trade performance or effectiveness to
avoid triggering kernel bugs.
The first such option is --workaround-btrfs-send, which avoids making any
modification to read-only subvols to avoid btrfs send bugs.
Clean up usage message: no tabs for formatting, split options into
sections by theme.
Make scan mode a non-static data member like all (most?) other options.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The -g option limits the number of worker threads when the target load
average is exceeded. On some systems the load normally runs high, and
continuous bees operation is required to avoid running out of disk space.
Add a -G/--thread-min option to force at least some threads to continue
running.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Add -g / --loadavg-target parameter to track system load and add or
remove bees worker threads dynamically to keep system load close to the
loadavg target. Thread count may vary from zero to the maximum
specified by -c or -C, and is adjusted every 5 seconds.
This is better than implementing a similar load average scheme from
outside of the process (though that is still possible) because the
in-process load tracker does not disrupt the performance timing feedback
mechanisms as a freezer cgroup or SIGSTOP would when controlling bees
from outside. The internal load average tracker can also adjust the
number of active threads while an external tracker can only choose from
the maximum or zero.
Also fix a bug where a Task could deadlock waiting for itself to exit
if it tries to insert a new Task after the number of worker threads has
been set to zero.
Also correct usage message for --scan-mode (values are 0..2) since
we are touching adjacent lines anyway.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Log messages were already labelled with log levels, but there was no
way to filter by log level at run time.
Implement the filter inside the bees process so it can skip evaluation
of the BEESLOG* arguments if the log messages would not be emitted.
Fixes: https://github.com/Zygo/bees/issues/67
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Having too many "write a message to the log" primitives is confusing,
and having one that intermittently and silently discards output is even
_more_ confusing.
Replace all BEESINFO with appropriate BEESLOG*s. Usually DEBUG.
Except for one or two that occur too often. Just delete those.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Since we are now unconditionally rendering the print_fn as a static
string, there is no need for it to be a function. We also need it to
be brief and mostly constant.
Use a string instead. Put the string before the function in the Task
constructor arguments so that the title string appears as a heading in
code, since we are making a breaking API change already.
Drop TASK_MACRO as it is broken by this change, but there is no similar
usage of Task anywhere to make it worth fixing.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Move pthread_setname_np to the same place we do pthread_getname_np.
Detect errors in pthread_getname_np--but don't throw an exception
because we would call ourself recursively from the exception handler
when it tries to log the exception.
Fix the order of set_name and the first BEESNOTE/BEESLOG call in threads,
closing small time intervals where logs have the wrong thread name,
and that wrong name becomes persistent for the thread.
Make the main thread's name "bees" because Linux kernel stack traces use
the pthread name of the main thread instead of the name of the process.
Anonymous threads get the process name (usually "bees"). We should not
have any such threads, but we do. This appears to occur mostly during
exception stack unwinding. GCC/pthread bug?
Fixes: https://github.com/Zygo/bees/issues/51
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
When a Task worker thread is executing a Task, the thread name is less
useful than the Task description.
Use the Task description instead of the thread name if the thread has
no BeesThread name and the thread is currently executing a task.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Threads from the Task module in libcrucible don't set BeesNote::tl_name.
Even if they did, in Task context the thread name is unspecific to the point
of meaninglessness.
Use the Task::print method as the name for such threads, and be sure
that future Task print functions are designed for that usage.
The extra complexity in BeesNote::get_name() seems preferable to
bombarding pthread_setname_np hundreds or thousands of times per second.
FIXME: we are now calling Task::print() on every BeesNote, which
is effectively unconditionally. Maybe we should have Task::print()
and get_name() return a closure, or just evaluate Task::print() once
and cache it in TaskState, or define Task's constructor with a string
argument instead of the current print_fn closure.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit adds log levels to the output. In systemd, it makes colored
lines, otherwise it's probably just a number. Bees is very chatty, so
this paves the road for log level filtering.
Signed-off-by: Kai Krakow <kai@kaishome.de>
There are two subvol scan algorithms implemented so far. The two modes
are unimaginatively named 0 and 1.
0: sorts extents by (inode, subvol, offset),
1: scans extents round-robin from all subvols.
Algorithm 0 scans references to the same extent at close to the same
time, which is good for performance; however, whenever a snapshot is
created, the scan of the entire filesystem restarts at the beginning of
the new snapshot.
Algorithm 1 makes continuous forward progress even when new snapshots
are created, but it does not benefit from caching and will force the
kernel to reread data multiple times when there are snapshots.
The algorithm can be selected at run-time using the -m or --scan-mode
option.
We can collect some field data on these before replacing them with
an extent-tree-based scanner. Alternatively, for pre-4.14 kernels,
we can keep these two modes as non-default options.
Currently these algorithms have terrible names. TODO: fix that, but
also TODO: delete all that code and do scans directly from the extent
tree instead.
Augment the scan algorithms relative to their earlier implementation by
batching multiple extents to scan from each subvol before switching to
a different subvol.
Sprinkle some BEESNOTEs on the Task objects so that they don't
disappear from the thread status output.
Adjust some timing constants to deal with the increased latency from
competing threads.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Distribute incoming extents across a thread pool for faster execution
on multi-core, multi-disk environments.
Switch extent enumeration model to scan extent refs consecutively(ish).
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Remove some dead code because dedup-related deadlocks have not been
observed since Linux kernel v4.11.
Preserve rationale of remaining #if 0 block (why we do write/rename
instead of write/fsync/rename) so that people don't try to replace the
"missing" fsync() there.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
With kernel 4.14 there is no sign of the previous LOGICAL_INO performance
problems, so there seems to be no need to throttle threads using this
ioctl.
Increase the FD cache size limits and scan thread count. Let the kernel
figure out scheduling.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
GCC 7 and higher turn a previous warning into an error for implicit
fallthrough. Let's hint the compiler that this is intentional here.
Signed-off-by: Kai Krakow <kai@kaishome.de>
Adjust bees to match changes in Chatter's interface.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit 66fd28830d42acbd837dfb08b89a1f9c05c6d474)
To make bees more friendly to use with syslog/systemd, we add an option
to omit timestamps from the log output.
Signed-off-by: Kai Krakow <kai@kaishome.de>
This commit adds a simple getopt options parser to show help. This can
be used as a boilerplate for adding more options later.
Signed-off-by: Kai Krakow <kai@kaishome.de>