There are kernel bugs in LOGICAL_INO from time to time; however, we
can't avoid these bugs by serializing LOGICAL_INO calls.
It hasn't been used for some time, so remove the code and
less-than-completely-accurate comments.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
For one thing, it should _say_ that there are too many duplicates.
We were making the user read the manual to find that out.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Forcing the entire hash table into immediate writeback causes crippling
write latencies at shutdown. Even discarding pages as they are read in
at startup can trigger a writeback latency spike if the pages are dirty
at read time.
Better to let the VM subsystem handle this on its own.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Putting this information in the logs saves us from having to ask for
the kernel version and machine name every time.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We need random numbers in more places, so centralize the engines.
Initialize with a proper random seed so every worker thread gets
different behavior.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
At the end of scanning one extent, in theory we do not need that extent
any more. In practice, it hurts benchmark scores if we drop the extents
after reading them.
Add a comment to note this where we put the bees_unreadhead call.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
BEESNOTE can only be seen if the status thread is running at the time,
making the log of activities during shutdown incomplete.
Wake up the status thread early during shutdown so the logged sequence
of shutdown actions is complete.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In the current architecture we can't directly measure the physical extent
size, and we can't make good decisions with the extent data (reference)
item alone. If the early return is enabled here, there is a small speedup
and a large drop in dedupe hit rate, especially when extent splits occur.
Leave the early return commented for now, but collect the event statistics.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Tree searches are all looking for specific item types. Skip over any
item types we are not interested in when resetting the search key for
the next search.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
BtrfsIoctlSearchKeyV2's constructor now fills in nr_items = 1, so we
don't need to set it explicitly any more.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The vector<uint8_t> in the hash table doesn't hurt very much--only a few
microseconds per 128K hash block.
The vector<uint8_t> in BeesBlockData hurts a bit more--we run that
constructor thousands of times per second.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Previously, when the bees send workaround is enabled, bees would
immediately advance the subvol's crawl status as if the entire subvol
had been scanned.
If the subvol is later made read-write, or if the workaround is disabled,
bees sees that the subvol has already been marked as scanned. This is
an unfortunate result if the subvol is inadvertently marked read-only
or if bees is inadvertently run with the send workaround disabled.
Instead, (almost) completely ignore the subvol: don't advance the crawl
pointer, don't consider the subvol in the list if searchable roots, and
don't consider the subvol when calculating min_transid for new subvols.
The "almost" part is: if the subvol scan has not yet started, keep its
start timestamp current so it won't mess up subvol traversal performance
metrics.
Also handle exceptions while determining whether a subvol is read-only,
as those apparently do happen.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In fiemap.h the members of struct fiemap are declared as __u64, but the
FIEMAP_MAX_OFFSET macro is an unsigned long long value:
$ grep FIEMAP_MAX_OFFSET -r /usr/include/
/usr/include/linux/fiemap.h:#define FIEMAP_MAX_OFFSET (~0ULL)
$ grep fe_length -r /usr/include/
/usr/include/linux/fiemap.h: __u64 fe_length; /* length in bytes for this extent */
This results in a type mismatch error on architectures like ppc64le:
fiemap.cc:31:35: note: deduced conflicting types for parameter 'const _Tp' ('long unsigned int' and 'long long unsigned int')
31 | fm.fm_length = min(fm.fm_length, FIEMAP_MAX_OFFSET - fm.fm_start);
| ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work around this by copying the macro into a uint64_t constant,
and not using the macro any more.
Fixes: https://github.com/Zygo/bees/issues/194
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The hash table is one of the few cases in bees where a non-trivial amount
of page cache memory will be used in a predictable way, so we can advise
the kernel about our IO demands in advance.
Use WILLNEED to prefetch hash table pages at startup.
Use DONTNEED to trigger writeback on hash table pages at shutdown.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In theory, we don't need the pread() loop, because the kernel will do a
better job with readahead().
In practice, we might still need the pread() code, as the readahead will
occur at idle IO priority, which could adversely affect bees performance.
More testing is required.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Like filefrag, fiemap was defaulting to FIEMAP_FLAG_SYNC, and providing no
option to turn it off. This prevents observation of delayed allocations,
making fiemap less useful.
Override the default flag setting so fiemap gets the current
(i.e. unflushed) extent map state.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
LOGICAL_INO_V2 has a maximum limit of 655050 references per extent.
Although it no longer has a crippling performance problem, at roughly
two seconds to process extent, it's too slow to be useful.
When an extent gains an absurd number of references, stop making any
more. Returning zero extent refs will make bees believe the extent
was deleted, and it will remove the block from the hash table.
This helps speed processing of highly duplicated large files like
VM images, and the cost of a slightly lower dedupe hit rate.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In 15ab981d9e "bees: replace uncaught_exception(), deprecated in C++17",
uncaught_exception() was replaced with current_exception(); however,
current_exception() is only valid after an exception has been captured
by a catch block.
BeesTracer wants to know about exceptions _before_ they are caught,
so current_exception() is not useful here.
Instead, conditionally compile using uncaught_exception() or
uncaught_exceptions(), selected by C++ standard version, and make
bees stack traces work again.
Fixes: 15ab981d9e "bees: replace uncaught_exception(), deprecated in C++17"
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This allows these components to be used by test executables without
pulling in all of bees, and more rapidly iterate their code.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Some users are hitting an exception somewhere in crawl_transid, which
forces bees to return back to the transid_max calculation over and over.
Also out-of-range transids.
Add some BEESTRACE so we can see what we were doing in the exception
handler.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Currently if crawl throws an exception, we don't have basic information
about what was being crawled or even if the crawler was running at all.
These traces also help identify the causes of early exception failures.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This might be interesting information, though most of the motivation for
this evaporated when kernel 5.7 came out.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
There seem to be multiple ways to do readahead in Linux, and only some
of them work. Hopefully reading the actual data is one of them.
This is an attempt to avoid page-by-page reads in the generic dedupe code.
We load both extents into the VFS cache (read sequentially) and hope they
are still there by the time we call dedupe on them.
We also call readahead(2) and hopefully that either helps or does nothing.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This enables us to correlate FD cache clears with external events such
as btrfs inode eviction storms.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Report the number of Task objects that currently exist as well as the number
on the global work queue.
THREADS (work queue 298 of 2385 tasks, 16 workers):
This helps spot leaks, since Task objects that are blocked on other Task
post-exec queues are otherwise invisible.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Change documentation and comments to use the word "dedupe," not "dedup"
as found in circa-3.15 kernel sources.
No changes in code or program output--if they used "dedup" before, they
will continue to be spelled "dedup" now.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Higher CPU core counts became more common, and kernel bugs became less
common, since the arbitrary 8-thread limit was introduced. We can remove
the limit now, and treat any remaining scaling inefficiency as a bug to
be removed.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The dependency was missing, so changes to the library would not trigger
a rebuild of the bees binary.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Support for multiple BeesContext objects sharing a FdCache was wasting
significant space and atomic inc/dec memory cycles for no good reason
since the shared-FdCache feature was deprecated.
open_root and open_root_ino still need a BeesContext to work. Pass the
BeesContext pointer through the function object instead of the cache
key arguments.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
pthread_cancel doesn't really work properly. It was only being used in
bees to bring threads to a stop if the BeesContext is destroyed early.
It is frequently implicated in core dump reports because of the fragility
of the C++ iostream / C stdio / library infrastructure, particularly
surrounding upgrades on the host running bees. The pthread_cancel call
itself often simply fails even when it doesn't call terminate().
Defer creation of the status and progress threads until after the
BeesContext::start method is invoked. At that point, the existing
ask-threads-nicely-to-stop code is up and running, and normal condvars
can be used to bring bees to a stop, without having to resort to
pthread_cancel.
Since we're deleting half of the BeesContext constructor in this change,
let's remove the other half too, and put an end to the deprecated support
for multiple BeesContexts sharing a process. It's still possible to run
multiple BeesContexts, but they will not share a FD cache. This will
allow the FD cache's keys to become smaller and hopefully save some
memory later on.
Fixes: #171
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The weird things distros do to the path where uuid.h gets installed
have broken bees builds for the last time.
We were only using uuid to support a legacy feature that was removed
over four years ago.
Hypothetical users who are upgrading directly from bees v0.1 should
probably restart all the crawlers anyway--there were bugs. Also, if any
such users exist, I respect their tremendous patience with the horrible
performance all these years--bees got about 30x faster since v0.1.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The Linux kernel's btrfs headers are better than the libbtrfs-dev headers:
- the libbtrfs-dev headers have C++ language compatibility issues
- upstream version in Linux kernel is more accurate and up to date
- macros in libbtrfs-dev's ctree.h hide information that would
enable bees to perform runtime buffer length checking
- enum types whose presence cannot be detected with #ifdef
When accessing members of metadata items from the filesystem, we want
to verify that the member we are accessing is within the boundaries of
the item that was retrieved; otherwise, a memory access violation may
occur or garbage may be returned to the caller. A simple C++ template,
given a pointer to a structure member and a buffer, can determine that
the buffer contains enough bytes to safely access a struct member.
This was implemented back in 2016, but left unused due to ctree.h issues.
Some btrfs metadata structures have variable length despite using a
fixed-size in-memory structure. The members that appear earliest in
the structure contain information about which following members of the
structure are used. The item stored in the filesystem is truncated after
the last used member, and all following members must not be accessed.
'btrfs_stack_*' accessor macros obscure the memory boundaries of the
members they access, which makes it impossible for a C++ template to
verify the memory access. If the template checks the length of the
entire structure, it will find an access violation for variable-length
metadata items because the item is rarely large enough for the entire
structure.
Get rid of all the libbtrfs-dev accessor macros and reimplement them
with the necessary buffer length checks.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Apparently it is missing in newer Linux headers, making
builds fail. We don't need it, so remove it.
Closes: #160
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Make these workarounds configurable in src/bees.h instead of #if 0
code blocks. Someday we'll make the constants in bees.h configurable
through a file or similar.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
fiewalk and fiemap depend on a lot of crucible, and incremental builds
fail hard without proper dependency tracking.
All binaries must be rebuilt when makeflags changes. This dependency
exists already in lib and test, but src was missing.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>