The bug is:
v6.3-rc6: f349b15e183d mm: vmalloc: avoid warn_alloc noise caused by fatal signal
The fixes are:
v6.4: 95a301eefa82 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
v6.3.10: c189994b5dd3 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
The bug has been backported to LTS, but the fix has not:
v6.2.11: 61334bc29781 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
v6.1.24: ef6bd8f64ce0 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
v5.15.107: a184df0de132 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
There was a bug in kernel 6.3 where LOGICAL_INO with IGNORE_OFFSET
sometimes fails to ignore the offset. That bug is now fixed, but
LOGICAL_INO still returns 0 refs much more often than seems appropriate.
This is most likely because bees frequently deletes extents while there
is still work waiting for them in Task queues. In this case, LOGICAL_INO
correctly returns an empty list, because every reference to some extent
is deleted, but the new extent tree with that extent removed is not yet
committed in btrfs.
Add a DEBUG-level log message and an event counter to track these events.
In the absence of a kernel bug, the debug message may indicate CPU time
was wasted performing a search whose outcome could have been predicted.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Toxic extents are much less of a problem now than they were in kernels
before 5.7. Downgrade the log message level to reflect their lesser
importance.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The critical kernel bugs in send have been fixed for years.
The limitations that remain aren't bugs, and bees has no sustainable
workaround for them.
Also update copyright year range.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We check the result of transid_max_nocache(), but not the result of
transid_max(). The latter is a computed result that is even more likely
to be wrong[citation needed].
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
At least one user was significantly confused by "designed for large
filesystems".
The btrfs send workarounds aren't new any more.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Clarify that "too large" and "too small" are some distance away from each other.
The Goldilocks zone is _wide_.
The interval between cache drops is now shorter.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Each object contains a 16 MiB buffer, which is very heavy for some
malloc implementations.
Keep the objects in a Pool so that their buffers are only allocated and
deallocated once in the process lifetime.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Some malloc implementations will try to mmap() and munmap() large buffers
every time they are used, causing a severe loss of performance.
Nothing ever overrode the virtual methods, and there was no virtual
destructor, so they cause compiler warnings at build time when used with
a template that tries to delete pointers to them.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
ProgressTracker was only freeing memory for work items when they reach
the head of the work tracking queue. If the first work item takes
hours to complete, and thousands of items are processed every second,
this leads to millions of completed items tracked in memory at a time,
wasting gigabytes of system RAM.
Rewrite ProgressHolderState methods to keep only incomplete work items
in memory, regardless of the order in which they are added or removed.
Also fix the unit tests which were relying on the memory leak to work,
and add test cases for code coverage.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Not sure what I was thinking, but the argument here should clearly
be uint64_t.
Fixes: https://github.com/Zygo/bees/issues/248
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
If the send workaround is enabled, it is possible for two threads (a
thread running the crawl_new task, and a thread attempting to apply the
send workaround) to access the same RootFetcher object at the same time.
That never ends well.
Give each function its own BtrfsRootFetcher object.
Fixes: https://github.com/Zygo/bees/issues/250
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
With SIGTERM and fast exit, the trickle writeback is less important.
We don't want to flood people's IO subsystems with continuous writes.
This really should be configurable at runtime.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Do rebuild bees-version.cc if libcrucible changes.
Don't rebuild bees-version.cc if it doesn't change.
Also use the standard suffix for new files.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
crucible::VERSION doesn't make much sense now that libcrucible no
longer exists as a shared library. Nothing ever referenced it, so
it can go away.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
According to ioctl_iflags(2):
The type of the argument given to the FS_IOC_GETFLAGS and
FS_IOC_SETFLAGS operations is int *, notwithstanding the
implication in the kernel source file include/uapi/linux/fs.h
that the argument is long *.
So this code doesn't work on be64 machines.
Also, Valgrind complains about it.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
A subtle distinction, and not one that is particularly relevant to bees,
but it does make toolchains complain.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Another instance of the pattern where we derived a crucible class
from a btrfs struct. Make it an automatic variable instead.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This was fixed in
7f660f50b lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks
but apparently some copies live on.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
These tools are obsolete. fiemap was a thin wrapper around FIEMAP,
but FIEMAP is not useful on btrfs. fiewalk was a thin wrapper around
BtrfsExtentWalker, but development on BtrfsExtentWalker has been
abandoned.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
When a hash table write fails, we skip over the write throttling because
we didn't report that we successfully wrote an extent. This can be bad
if the filesystem is full and the allocations for writes are burning a
lot of CPU time searching for free space.
We also don't retry the write later on since we assume the extent is
clean after a write attempt whether it was successful or not, so the
extent might not be written out later when writes are possible again.
Check whether a hash extent is dirty, and always throttle after
attempting the write.
If a write fails, leave the extent dirty so we attempt to write it out
the next time flush cycles through the hash table. During shutdown
this will reattempt each failing write once, after that the updated hash
table data will be dropped.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Calling 'bees -m4' should not call 'std::terminate()', but it does.
Use catch_all instead. It will still pass the exit value to return
from main.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We only use BtrfsExtentInfo when it's exactly equivalent to the
base, so drop the derived class.
While we're here, fix BtrfsExtentSame::add so it uses a btrfs-compatible
uint64_t instead of an off_t.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
BEESTOOLONG was always reporting a size of zero, and the offset of the
end of the readahead region. Report the original size instead (and also
in BEESTRACE and BEESNOTE).
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Drop the crawl_restart counter, it doesn't happen here (or anywhere else).
Add the crawl_again counter for extents that are restarted due to an
extent-level lock.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
libcrucible can deal with the Linux kernel and/or libc's thread name
limitations. No need to duplicate that work in bees.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
It turns out I've been using pthread_setname_np wrong the whole time:
* on Linux, the thread name length is 15 characters.
TASK_COMM_LEN is 16 bytes, and the last one is always 0.
This is now hardcoded in many places and cannot be changed.
* pthread_setname_np doesn't return -errno, so DIE_IF_MINUS_ERRNO
was the wrong macro. On the other hand, we never want to do anything
differently when pthread_setname_np fails, so we never needed to
check the return value.
Also, libc silently ignores attempts to set the thread name when it is too
long. That's almost certainly a libc bug, but libc probably suppresses
the error result for the same reasons I ignore the error result.
Wrap the pthread_setname function with a C++ std::string overload that
truncates the argument at 15 characters, so we at least get the first
part of the task name in the thread name field. Later commits can deal
with making the bees thread names shorter.
Also wrap pthread_getname for symmetry.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>