1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 13:25:45 +02:00

694 Commits

Author SHA1 Message Date
Zygo Blaxell
a59a02174f table: add a simple text table renderer
This should help clean up some of the uglier status outputs.

Supports:

 * multi-line table cells
 * character fills
 * sparse tables
 * insert, delete by row and column
 * vertical separators

and not much else.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
e22653e2c6 docs: remove "matched_" prefix event counters
We can no longer reliably determine the number of hash table matches,
since we'll stop counting after the first one.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
44810d6df8 scan_one_extent: remove the unreadahead after benchmark results
That unreadahead used to result in a 10% hit on benchmarks.  Now it's
closer to 75%.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
8f92b1dacc BeesRangePair: drop the _really_ expensive toxic extent workaround
We were doing a `LOGICAL_INO` ioctl on every _block_ of a matching extent,
just to see how long it takes.  It takes a while!

This could be modified to do an ioctl with the `IGNORE_OFFSET` flag,
once per new extent, but the kernel bug was fixed a long time ago, so
we can start removing all the toxic extent code.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
0b974b5485 scan_one_extent: in skip/scan lines, log whether extent is compressed
Useful for debugging the compressed-zero-block cases.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
ce0367dafe scan_one_extent: reduce the number of LOGICAL_INO calls before finding a duplicate block range
When we have multiple possible matches for a block, we proceed in three
phases:

1.  retrieve each match's extent refs and put them in a list,
2.  iterate over the list converting viable block matches into range matches,
3.  sort and flatten the list of range matches into a non-overlapping
list of ranges that cover all duplicate blocks exactly once.

The separation of phase 1 and 2 creates a performance issue when there
are many block matches in phase 1, and all the range matches in phase
2 are the same length.  Even though we might quickly find the longest
possible matching range early in phase 2, we first extract all of the
extent refs from every possible matching block in phase 1, even though
most of those refs will never be used.

Fix this by moving the extent ref retrieval in phase 1 into a single
loop in phase 2, and stop looping over matching blocks as soon as any
dedupe range is created.  This avoids iterating over a large list of
blocks with expensive `LOGICAL_INO` ioctls in an attempt to improve the
match when there is no hope of improvement, e.g. when all match ranges
are 4K and the content is extremely prevalent in the data.

If we find a matched block that is part of a short matching range,
we can replace it with a block that is part of a long matching range,
because there is a good chance we will find a matching hash block in
the long range by looking up hashes after the end of the short range.
In that case, overlapping dedupe ranges covering both blocks in the
target extent will be inserted into the dedupe list, and the longest
matches will be selected at phase 3.  This usually provides a similar
result to that of the loop in phase 1, but _much_ more efficiently.

Some operations are left in phase 1, but they are all using internal
functions, not ioctls.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
54ed6e1cff docs: event counter updates after fixing counter names and scan_one_extent improvements
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
24b08ef7b7 scan_one_extent: eliminate nuisance dedupes, drop caches after reading data
A laundry list of problems fixed:

 * Track which physical blocks have been read recently without making
 any changes, and don't read them again.

 * Separate dedupe, split, and hole-punching operations into distinct
 planning and execution phases.

 * Keep the longest dedupe from overlapping dedupe matches, and flatten
 them into non-overlapping operations.

 * Don't scan extents that have blocks already in the hash table.
 We can't (yet) touch such an extent without making unreachable space.
 Let them go.

 * Give better information in the scan summary visualization:  show dedupe
 range start and end points (<ddd>), matching blocks (=), copy blocks
 (+), zero blocks (0), inserted blocks (.), unresolved match blocks
 (M), should-have-been-inserted-but-for-some-reason-wasn't blocks (i),
 and there's-a-bug-we-didn't-do-this-one blocks (#).

 * Drop cached data from extents that have been inserted into the hash
 table without modification.

 * Rewrite the hole punching for uncompressed extents, which apparently
 hasn't worked properly since the beginning.

Nuisance dedupe elimination:

 * Don't do more than 100 dedupe, copy, or hole-punch operations per
 extent ref.

 * Don't split an extent or punch a hole unless dedupe would save at
 least half of the extent ref's size.

 * Write a "skip:" summary showing the planned work when nuisance
 dedupe elimination decides to skip an extent.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
97eab9655c types: add shrink_begin and shrink_end methods for BeesFileRange and BeesRangePair
These allow trimming of overlapping dedupes.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
05bf1ebf76 counters: fix counter names for scan_eof, scan_no_fd, scanf_deferred_inode
This code gets moved around from time to time and ends up with the
wrong prefix.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
606ac01d56 multilock: allow turning it off
Add a master switch to turn off the entire MultiLock infrastructure for
testing, without having to remove and add all the individual entry points.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
72c3bf8438 fs: handle ENOENT within lib
This prevents the storms of exceptions that occur when a subvol is
deleted.  We simply treat the entire tree as if it was empty.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
72958a5e47 btrfs-tree: accessors for TreeFetcher classes' type and tree values
Sometimes we have a generic TreeFetcher and we need to know which tree
it came from.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
f25b4c81ba btrfs-tree: add root refs and extent flags fields
Lazily filling in accessor methods for btrfs objects as needed by bees.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
a64603568b task: fix try_lock argument description
try_lock allows specification of a different Task to be run instead of
the current Task when the lock is busy.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
33cde5de97 bees: increase file cache size limits
With some extents having 9999 refs, we can use much larger caches for
file descriptors.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
5414c7344f docs: resolve_overflow limit is only 655050 when BTRFS_MAX_EXTENT_REF_COUNT is
Use the current header value in the doc.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
8bac00433d bees: reduce extent ref limit to 9999
Originally the limit was 2730 (64KiB worth of ref pointers).  This limit
was a little too low for some common workloads, so it was then raised by
a factor of 256 to 699050, but there are a lot of problems with extent
counts that large.  Most of those problems are memory usage and speed
problems, but some of them trigger subtle kernel MM issues.

699050 references is too many to be practical.  Set the limit to 9999,
only 3-4x larger than the original 2730, to give up on deduplication
when each deduped ref reduces the amount of space by no more than 0.01%.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
088cbc951a docs: event counter updates after readahead sanity improvements
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
e78e05e212 readahead: inject more sanity at the foundation of an insane architecture
This solves a third bad problem with bees reads:

3.  The architecture above the read operations will issue read requests
for the same physical blocks over and over in a short period of time.

Fixing that properly requires rewriting the upper-level code, but a
simple small table of recent read requests can reduce the effect of the
problem by orders of magnitude.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
8d08a3c06f readahead: inject some sanity at the foundation of an insane architecture
This solves some of the worst problems with bees reads:

1.  The kernel readahead doesn't work.  More precisely, it's much better
adapted for a very different use case:  a single thread alternating
between reading a file sequentially and processing the data that was read.
bees has multiple threads which compete for access to IO and then issue
reads in random order immediately after the call to readahead.  The kernel
uses idle ioprio scheduling for the readaheads, so the readaheads get
preempted by the random reads, or cancels the readaheads because the
data access pattern isn't sequential after the readahead was issued.

2.  Seeking drives perform terribly with multiple competing readers,
especially with btrfs striped profiles where the iops are broken into
tiny stripe-sized pieces.  At one point I intended to read the btrfs
device map and figure out which devices can be read in parallel, but to
make that useful, the user needs to have an array with multiple drives
in single profile, or 4+ drives in raid1 profile.  In all other cases,
the elaborate calculations always return the same result:  there can be
only one reader at a time.

This commit fixes both problems:

1.  Don't use the kernel readahead.  Use normal reads into a dummy
buffer instead.

2.  Allow only one thread to readahead at any time.  Once the read is
completed, the data is in the page cache, and all the random-order small
reads that bees does will hit the page cache, not a spinning disk.
In some cases we need to read two things close together, so add a
`bees_readahead_pair` which holds one lock across both reads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
cdcdf8e218 hash: use kernel readahead instead of bees_readahead to prefetch hash table
The hash table is read sequentially and from a single thread, so
the kernel's implementation of readahead is appropriate here.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
37f5b1bfa8 docs: add allocator regression in 6.0+ kernels
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
abe2afaeb2 context: when a task fails to acquire an extent lock, don't go ahead and scan the extent anyway
Commit c3b664fea54cfd8ac25411cbdb9536e4f24b008e ("context: don't forget
to retry locked extents") removed the critical return that prevents a
Task from processing an extent that is locked.

Put the return back.

Fixes: c3b664fea54cfd8ac25411cbdb9536e4f24b008e ("context: don't forget to retry locked extents")
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
792fdbbb13 fs: get rid of 16 MiB limit on dedupe requests
The kernel has not required a 16 MiB limit on dedupe requests since
v4.18-rc1 b67287682688 ("Btrfs: dedupe_file_range ioctl: remove 16MiB
restriction").

Kernels before v4.18 would truncate the request and return the size
actually deduped in `bytes_deduped`.  Kernel v4.18 and later will loop
in the kernel until the entire request is satisfied (although still
in 16 MiB chunks, so larger extents will be split).

Modify the loop in userspace to measure the size the kernel actually
deduped, instead of assuming the kernel will only accept 16 MiB.
On current kernels this will always loop exactly once.

Since we now rely on `bytes_deduped`, make sure it has a sane value.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
30a4fb52cb Revert "context: add experimental code for avoiding tiny extents"
because this problem is better solved elsewhere.

This reverts commit 11fabd66a84b6631fb39bb6b2c066c689351fc26.
2024-11-30 23:30:33 -05:00
Zygo Blaxell
90d7075358 usage: the default scan mode is 3 (recent)
The code and docs were changed some time ago, but not the usage message.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
faac895568 docs: add the 6.10..6.12 delayed refs bug
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
a7baa565e4 crawl: rename next_transid() to avoid confusion with BeesScanMode::next_transid()
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
b408eac98e trace: add file and line numbers all the way up the stack
These were added to crucible all the way back in 2018 (1beb61fb78ba
"crucible: error: record location of exception in what() message")
but it's even more useful in the stack tracer in bees.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:27:24 -05:00
Zygo Blaxell
75131f396f context: reduce the size of LOGICAL_INO buffers
Since we'll never process more than BEES_MAX_EXTENT_REF_COUNT extent
references by definition, it follows that we should not allocate buffer
space for them when we perform the LOGICAL_INO ioctl.

There is some evidence (particularly
https://github.com/Zygo/bees/issues/260#issuecomment-1627598058) that
the kernel is subjecting the page cache to a lot of disruption when
trying allocate large buffers for LOGICAL_INO.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-04-17 23:14:35 -04:00
Zygo Blaxell
cfb7592859 usage: the default scan mode is 1 (independent)
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-04-17 23:14:35 -04:00
Zygo Blaxell
3839690ba3 lib: fix btrfs_data_container pointer casts for 32-bit userspace on 64-bit kernels
Apparently reinterpret_cast<uint64_t> sign-extends 32-bit pointers.
This is OK when running on a 32-bit kernel that will truncate the pointer
to 32 bits, but when running on a 64-bit kernel, the extra bits are
interpreted as part of the (now very invalid) address.

Use <uintptr_t> instead, which is unsigned, integer, and the same word
size as the arch's pointer type.  Ordinary numeric conversion can take
it from there, filling the rest of the word with zeros.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-04-17 23:07:41 -04:00
Zygo Blaxell
124507232f docs: add vmalloc bug to kernel bugs list
The bug is:

	v6.3-rc6: f349b15e183d mm: vmalloc: avoid warn_alloc noise caused by fatal signal

The fixes are:

	v6.4: 95a301eefa82 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails
	v6.3.10: c189994b5dd3 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails

The bug has been backported to LTS, but the fix has not:

	v6.2.11: 61334bc29781 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
	v6.1.24: ef6bd8f64ce0 mm: vmalloc: avoid warn_alloc noise caused by fatal signal
	v5.15.107: a184df0de132 mm: vmalloc: avoid warn_alloc noise caused by fatal signal

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
v0.10
2023-07-06 13:50:12 -04:00
Zygo Blaxell
3c5e13c885 context: log when LOGICAL_INO returns 0 refs
There was a bug in kernel 6.3 where LOGICAL_INO with IGNORE_OFFSET
sometimes fails to ignore the offset.  That bug is now fixed, but
LOGICAL_INO still returns 0 refs much more often than seems appropriate.

This is most likely because bees frequently deletes extents while there
is still work waiting for them in Task queues.  In this case, LOGICAL_INO
correctly returns an empty list, because every reference to some extent
is deleted, but the new extent tree with that extent removed is not yet
committed in btrfs.

Add a DEBUG-level log message and an event counter to track these events.
In the absence of a kernel bug, the debug message may indicate CPU time
was wasted performing a search whose outcome could have been predicted.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-07-06 12:54:33 -04:00
Zygo Blaxell
a6ca2fa2f6 docs: add IGNORE_OFFSET regression in 6.2..6.3 to kernel bugs list
This doesn't impact the current bees master, but it does break bees-next.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-07-06 12:49:36 -04:00
Zygo Blaxell
3f23a0c73f context: downgrade toxic extent workaround message
Toxic extents are much less of a problem now than they were in kernels
before 5.7.  Downgrade the log message level to reflect their lesser
importance.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-07-06 12:49:36 -04:00
Zygo Blaxell
d6732c58e2 test: GCC 13 fix for limits.cc
GCC complains that #include <cstdint> is missing, so add that.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-05-07 21:24:21 -04:00
Zygo Blaxell
75b2067cef btrfs-tree: fix build on clang++16
The "loops" variable isn't read (only set) if not built with extra
debug code.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-05-07 21:23:27 -04:00
Zygo Blaxell
da3ef216b1 docs: working around btrfs send issues isn't really a feature
The critical kernel bugs in send have been fixed for years.
The limitations that remain aren't bugs, and bees has no sustainable
workaround for them.

Also update copyright year range.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-03-07 10:25:51 -05:00
Zygo Blaxell
b7665d49d9 docs: fill in missing LTS backports for "1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match"
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-03-07 10:17:44 -05:00
Zygo Blaxell
717bdf5eb5 roots: make sure transid_max's computed value isn't max
We check the result of transid_max_nocache(), but not the result of
transid_max().  The latter is a computed result that is even more likely
to be wrong[citation needed].

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:45:29 -05:00
Zygo Blaxell
9b60f2b94d docs: add "missing" features that have been in development for some time already
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:42:42 -05:00
Zygo Blaxell
8978d63e75 docs: update GCC versions list and clarify markdown statement
I don't know if anyone else is testing GCC versions before 8.0 any more,
but I'm not.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:39:55 -05:00
Zygo Blaxell
82474b4ef4 docs: update front page
At least one user was significantly confused by "designed for large
filesystems".

The btrfs send workarounds aren't new any more.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:38:50 -05:00
Zygo Blaxell
73834beb5a docs: minor changes to how-it-works based on past user questions
Clarify that "too large" and "too small" are some distance away from each other.
The Goldilocks zone is _wide_.

The interval between cache drops is now shorter.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:37:37 -05:00
Zygo Blaxell
c92ba117d8 docs: various gotcha updates
Fixing the obviously wrong and out of date stuff.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:37:23 -05:00
Zygo Blaxell
c354e77634 docs: simplify the exit-with-SIGTERM description
The description now matches the code again.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:36:44 -05:00
Zygo Blaxell
f21569e88c docs: update the feature interactions page
Fixing the obviously out-of-date and no-longer-tested things.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:34:22 -05:00
Zygo Blaxell
3d5ebe4d40 docs: update kernel bugs and workarounds list for 6.2.0
Remove some of the repetition to make the document easier to edit.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-02-25 03:32:52 -05:00