The bug is:
v6.3-rc6: f349b15e183d mm: vmalloc: avoid warn_alloc noise caused by fatal signal
The fixes are:
v6.4: 95a301eefa82 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
v6.3.10: c189994b5dd3 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
The bug has been backported to LTS, but the fix has not:
v6.2.11: 61334bc29781 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
v6.1.24: ef6bd8f64ce0 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
v5.15.107: a184df0de132 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
There was a bug in kernel 6.3 where LOGICAL_INO with IGNORE_OFFSET
sometimes fails to ignore the offset. That bug is now fixed, but
LOGICAL_INO still returns 0 refs much more often than seems appropriate.
This is most likely because bees frequently deletes extents while there
is still work waiting for them in Task queues. In this case, LOGICAL_INO
correctly returns an empty list, because every reference to some extent
is deleted, but the new extent tree with that extent removed is not yet
committed in btrfs.
Add a DEBUG-level log message and an event counter to track these events.
In the absence of a kernel bug, the debug message may indicate CPU time
was wasted performing a search whose outcome could have been predicted.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The critical kernel bugs in send have been fixed for years.
The limitations that remain aren't bugs, and bees has no sustainable
workaround for them.
Also update copyright year range.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
At least one user was significantly confused by "designed for large
filesystems".
The btrfs send workarounds aren't new any more.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Clarify that "too large" and "too small" are some distance away from each other.
The Goldilocks zone is _wide_.
The interval between cache drops is now shorter.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Also attempted to clarify the descriptions of the modes based on
feedback and questions from users over the years.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
When two Tasks attempt to lock the same extent, append the later Task
to the earlier Task's post-exec work queue. This will guarantee that
all Tasks which attempt to manipulate the same extent will execute
sequentially, and free up threads to process other extents.
Similarly, if two scanner threads operate on the same inode, any dedupe
they perform will lock out other scanner threads in btrfs. Avoid this
by serializing Task objects that reference the same file.
This does theoretically use an unbounded amount of memory, but in practice
a Task that encounters a contended extent or inode quickly stops spawning
new Tasks that might increase the queue size, and all Tasks that might
contend for the same lock(s) end up on a single FIFO queue.
Note that the scope of inode locks is intentionally global, i.e. when
an inode is locked, it locks every inode with the same number in every
subvol. This avoids significant lock contention and task queue growth
when the same inode with the same file extents appear in snapshots.
Fixes: https://github.com/Zygo/bees/issues/158
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Kernels that needed the balance workaround frankly are too buggy
to run bees at all. The workaround also makes the locking stories
around logical_ino calls and process exit complicated, so get rid of
it completely.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
https://github.com/Zygo/bees/pull/209
Edited: regenerate docs for the downstream change in index.md.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
For one thing, it should _say_ that there are too many duplicates.
We were making the user read the manual to find that out.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In commit d9e3c0070b8e6b382b7956d286e43e0e6643f360 "context: stop creating
new refs when there are too many already" we added a new counter, but didn't
document it.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
In the current architecture we can't directly measure the physical extent
size, and we can't make good decisions with the extent data (reference)
item alone. If the early return is enabled here, there is a small speedup
and a large drop in dedupe hit rate, especially when extent splits occur.
Leave the early return commented for now, but collect the event statistics.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Two new tree mod log bugs #5 and #6 (uncovered by the zoned IO work,
though #6 has been seen in the wild on 5.10.29).
Tweak the next of some of the workarounds.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Change documentation and comments to use the word "dedupe," not "dedup"
as found in circa-3.15 kernel sources.
No changes in code or program output--if they used "dedup" before, they
will continue to be spelled "dedup" now.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Higher CPU core counts became more common, and kernel bugs became less
common, since the arbitrary 8-thread limit was introduced. We can remove
the limit now, and treat any remaining scaling inefficiency as a bug to
be removed.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The kernel from such an old distro version likely has several unfixed
bugs. Better not to support it at all.
Users who can upgrade the kernel are probably also sophisticated enough
to fix the build issues too.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The weird things distros do to the path where uuid.h gets installed
have broken bees builds for the last time.
We were only using uuid to support a legacy feature that was removed
over four years ago.
Hypothetical users who are upgrading directly from bees v0.1 should
probably restart all the crawlers anyway--there were bugs. Also, if any
such users exist, I respect their tremendous patience with the horrible
performance all these years--bees got about 30x faster since v0.1.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The slow backrefs performance improvement is confirmed by reports from
multiple users:
* Me (5.4.60 + backref patches, 5.7 to 5.11)
* https://github.com/Zygo/bees/issues/161 (5.8)
* https://github.com/Zygo/bees/issues/162 (5.8)
* IRC user S0rin (5.4.88 + backref patches)
The issue still exists, but at a significantly reduced scale: now about
2 ms of CPU per ref on a fast machine.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
A long time ago, when bees used dedicated threads to scan each subvol, the
calculation of the "dedup_unique_bytes" statistic was still wrong.
This stat can only be calculated when dedupe runs on extent data items
instead of extent reference items. Remove the stat variable until
that happens.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
There was a 4th tree mod log crash that showed up in testing. It can
be reproduced or eliminated by applying or reverting d2311e698578
("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
to a 5.4.x kernel before 5.4.54.
Unfortunately, the test can only run if several other patches that
fixed other bugs in d2311e698578 are applied or removed at the same time.
Commit d2311e698578 introduces a bug which destroys filesystems under test
long before tree mod log failures can be reproduced in testing. One of
those patches also fixes tree mod log issue #4. I do not know which one,
but since kernels after 5.1 cannot run without all of those patches, I do
not think it matters.
Tree mod issue #4 is the reason why the tree mod workaround is still
required on all kernels before 5.4. The issue still exists on older
LTS kernels, e.g. 4.9.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Rewrite the text related to 'btrfs send' to clarify that the send
workaround is no longer necessary to avoid kernel crashes, but still
useful because send and dedupe still do not work at the same time.
Replace "many backref code changes" with a specific commit reference,
and improve the grammar of some issue descriptions.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Apparently there's Github Flavored Markdown, and there's the markup
language that github uses, and they are distinct things.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Present known kernel bugs in table form with issue descriptions,
fixed and broken kernel versions, and references to fixes.
Update kernel version recommendations to include information on kernel
versions up to 5.8.14.
Reduce emphasis on data corruption bugs which are 1) two or more
years old now, and 2) much less bad than the bugs in kernel 5.1.
Add deprecation warning for kernels before 4.15.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>