This allows plugging in an ostream at run time so that we can audit all
the search calls we are doing.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Enable use of the ioctl to probe whether two fds refer to the same btrfs,
without throwing an exception.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
hexdump was moved into a template in its own header years ago, but
the declaration of the implementation that used to be in fs.cc remains.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Some malloc implementations will try to mmap() and munmap() large buffers
every time they are used, causing a severe loss of performance.
Nothing ever overrode the virtual methods, and there was no virtual
destructor, so they cause compiler warnings at build time when used with
a template that tries to delete pointers to them.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Another instance of the pattern where we derived a crucible class
from a btrfs struct. Make it an automatic variable instead.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This was fixed in
7f660f50b lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks
but apparently some copies live on.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We only use BtrfsExtentInfo when it's exactly equivalent to the
base, so drop the derived class.
While we're here, fix BtrfsExtentSame::add so it uses a btrfs-compatible
uint64_t instead of an off_t.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We already had a function that was _similar_, so add decoding for compress
type NONE, give it a less specific name, and declare it in fs.h.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Yet another build failure of the form:
error: flexible array member fiemap... not at end of struct crucible::Fiemap...
bees doesn't use fiemap any more, so the fixes here are minimal changes
to make it build, not shining examples of C++ class design.
Signer-off-by: Zygo Blaxell <bees@furryterror.org>
This fixes another build failure of the form:
error: flexible array member btrfs_... not at end of struct crucible::Btrfs...
Fixes: https://github.com/Zygo/bees/issues/236
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The base class thing was an ugly way to get around the lack of C99
compound literals in C++, and also to make the bare ioctls usable with
the derived classes.
Today, both clang and gcc have C99 compound literals, so there's no need
to do crazy things with memset. We never used the derived classes for
ioctls, and for this specific ioctl it would have been a very, very bad
idea, so there's no need to support that either. We do need to jump
through hoops for ostream& operator<<() but we had to do those anyway
as there are other members in the derived type.
So we can simply drop the base class, and build the args object on the
stack in `do_ioctl`. This also removes the need to verify initialization.
There's no bug here since the `info` member of the base class was
never used in place by the derived class, but new compilers reject the
flexible array member in the base class because the derived class makes
`info` be not at the end of the struct any more:
error: flexible array member btrfs_ioctl_same_args::info not at end of struct crucible::BtrfsExtentSame
Fixes: https://github.com/Zygo/bees/issues/232
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
When we are searching the btrfs metadata trees, we usually want only
one type of item. If the last item in a search result is not of the
desired type, we can restart the search at the next possible key with
that item type, potentially skipping over some uninteresting items we
would otherwise have to fetch, process, and discard.
Also remove a bug in the previous next_min code that would skip over
items if the offset overflowed and the next objectid in the tree had a
lower item type number than the previous objectid. This doesn't seem
to be a bug that has ever happened, as it would require a file to roll
over in the offset field.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Switch various methods in fs to use ByteVector to cut down on the number
of slow allocations and copies.
Automatically determine the correct size for TREE_SEARCH_V2 buffers
based on the number of items requested, and grow the buffer as needed.
This eliminates the need to cache some objects that were heavy to create.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In fiemap.h the members of struct fiemap are declared as __u64, but the
FIEMAP_MAX_OFFSET macro is an unsigned long long value:
$ grep FIEMAP_MAX_OFFSET -r /usr/include/
/usr/include/linux/fiemap.h:#define FIEMAP_MAX_OFFSET (~0ULL)
$ grep fe_length -r /usr/include/
/usr/include/linux/fiemap.h: __u64 fe_length; /* length in bytes for this extent */
This results in a type mismatch error on architectures like ppc64le:
fiemap.cc:31:35: note: deduced conflicting types for parameter 'const _Tp' ('long unsigned int' and 'long long unsigned int')
31 | fm.fm_length = min(fm.fm_length, FIEMAP_MAX_OFFSET - fm.fm_start);
| ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work around this by copying the macro into a uint64_t constant,
and not using the macro any more.
Fixes: https://github.com/Zygo/bees/issues/194
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The weird things distros do to the path where uuid.h gets installed
have broken bees builds for the last time.
We were only using uuid to support a legacy feature that was removed
over four years ago.
Hypothetical users who are upgrading directly from bees v0.1 should
probably restart all the crawlers anyway--there were bugs. Also, if any
such users exist, I respect their tremendous patience with the horrible
performance all these years--bees got about 30x faster since v0.1.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The Linux kernel's btrfs headers are better than the libbtrfs-dev headers:
- the libbtrfs-dev headers have C++ language compatibility issues
- upstream version in Linux kernel is more accurate and up to date
- macros in libbtrfs-dev's ctree.h hide information that would
enable bees to perform runtime buffer length checking
- enum types whose presence cannot be detected with #ifdef
When accessing members of metadata items from the filesystem, we want
to verify that the member we are accessing is within the boundaries of
the item that was retrieved; otherwise, a memory access violation may
occur or garbage may be returned to the caller. A simple C++ template,
given a pointer to a structure member and a buffer, can determine that
the buffer contains enough bytes to safely access a struct member.
This was implemented back in 2016, but left unused due to ctree.h issues.
Some btrfs metadata structures have variable length despite using a
fixed-size in-memory structure. The members that appear earliest in
the structure contain information about which following members of the
structure are used. The item stored in the filesystem is truncated after
the last used member, and all following members must not be accessed.
'btrfs_stack_*' accessor macros obscure the memory boundaries of the
members they access, which makes it impossible for a C++ template to
verify the memory access. If the template checks the length of the
entire structure, it will find an access violation for variable-length
metadata items because the item is rarely large enough for the entire
structure.
Get rid of all the libbtrfs-dev accessor macros and reimplement them
with the necessary buffer length checks.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
When we are using non-copying containers, we can't call resize() on them.
get_struct_ptr is essentially a pointer cast, so we will end up with a
pointer to a struct that extends beyond the boundaries of the container.
As long as the btrfs metadata is not corrupted, we should not have too
many problems.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Use uint8_t when we mean uint8_t, i.e. vector<uint8_t> instead of
vector<char>.
Add a template parameter instead of vector so we can swap in a
non-copying data type.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Define a local copy of the header that has fields for the csum type
and length, so we can build in places that haven't caught up to kernel
5.5 headers yet.
The reason why the csum type and length are not unconditionally filled
in eludes me. csum_length is necessarily non-zero, and the cost of
the conditional is worse than the cost of the copy, so the whole flags
dance is a WTF...but it's part of the kernel API now, so it's too late
to NAK it.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Perf blames this operator for >1% of instructions with -O2, and
70% of instructions without -O2.
Let the compiler inline the function.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
It is not possible to emulate extent-same by clone in a safe way.
EXTENT_SAME has been supported in btrfs since kernel 3.13, which
is much too old to contemplate running bees on.
Remove this dangerous and unused function.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
If we are not zero-filling containers then the overhead of allocating them
on each use is negligible. The effect that the thread_local containers
were having on RAM usage was very non-negligible.
Use dynamic containers (members or stack objects) for better control
of object lifetimes and much lower peak RAM usage. They're a tiny bit
faster, too.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Automatically fall back to LOGICAL_INO if LOGICAL_INO_V2 fails and no
_V2 flags are used.
Add methods to set the flags argument with build portability to older
headers.
Use thread_local storage for the somewhat large buffers used by
LOGICAL_INO_V2 (and other users of BtrfsDataContainer like INO_PATHS).
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This gets rid of some more big memsets. It may replace them
with a lot of tiny mallocs, though. If this turns out to be
a bad idea then at least we can easily revert the change.
We really do need some large buffers for BtrfsIoctlSearchKey in some
cases, but we don't need to zero them out first. Don't do that so we
save some CPU.
Reduce the default buffer size to 4K because most BISK users don't get
need much more than 1K. Set the buffer size explicitly to the product of
the number of items and the desired item size in the places that really
need a lot of items.
It turns out we never use a value for m_buf_size that isn't the default,
and we also never ask for more than a few thousand items; however,
we do spend a ton of time memsetting the huge buffer to zero.
I don't know what the ideal size is, but 16K is a far better guess
than 1MB. Let's reduce it for some immediate CPU benefit, and determine
what the size should be later.
Reported at https://github.com/Zygo/bees/issues/11
I accidentally did a pre-push verification on a 32-bit build host.
There were a surprisingly small number of problems, so fix them.
Bees now builds on a 32-bit host. Let's not update README just yet,
though: the 32-bit ioctl support fails immediately after startup on a
64-bit kernel.