1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

748 Commits

Author SHA1 Message Date
Zygo Blaxell
613ddc3c71 progress: rename "ctime" -> "tm_left"
"ctime", an abbreviation of "cycle time", collides with "ctime", an
abbreviation of "st_ctime", a well-known filesystem term.

"tm_left" fits in the column, so use that.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
v0.11-rc2
2025-01-06 12:50:50 -05:00
Zygo Blaxell
c3a39b7691 progress: rework the progress table after github discussion
* Report position within cycle in units that cannot be mistaken for size or percentage
* Put the total/maximum values in their own row
* Add a start time column
* Change column titles to reference "cycles"
* Use "idle" instead of "finished" when a crawler is not running
* Replace "transid" with "gen" because it's shorter

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:45:37 -05:00
Zygo Blaxell
58db4071de roots: use a non-idle task for next_transid
The scanners which finish early can become stuck behind scanners that are
able to keep the queue full.  Switch the next_transid task to the normal
Task queues so that we force scanners to restart on every new transaction,
possibly deferring already queued work to do so.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:36:53 -05:00
Zygo Blaxell
0d3e13cc5f context: report time in scan_one_extent
Add yet another field to the scan/skip report line:  the wallclock
time used to process the extent ref.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:36:53 -05:00
Zygo Blaxell
1af5fcdf34 roots: don't access a shared variable after releasing a lock
Access the local copy of `m_root_crawl_map` instead.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:36:53 -05:00
Zygo Blaxell
87472b6086 extent scan: don't put non-data block groups in the data extent map
The total data size should not include metadata or system block groups,
and already does not; however, we still have these block groups in the map
for mapping the crawl pointer to a logical offset within the filesystem.

Rearrange a few lines around the `if` statement so that the map doesn't
contain anything it should not.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:32:48 -05:00
Zygo Blaxell
ca351d389f extent scan: pick the right block groups for mixed-bg filesystems
The progress indicator was failing on a mixed-bg filesystem because those
filesystems have block groups which have both _DATA and _METADATA bits,
and the filesystem size calculation was excluding block groups that have
_METADATA set.  It should exclude block groups that have _DATA not set.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:15:37 -05:00
Zygo Blaxell
1f0b8c623c options: improve message when too many--or too few--path arguments given
Running bees with no arguments complains about "Only one" path argument.
Replace this with "Exactly one" which uses similar terminology to other
btrfs tools.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:15:37 -05:00
Zygo Blaxell
74296c644a options: return EXIT_SUCCESS after displaying help message
`getopt_long` already supplies a message when an option cannot be parsed,
so there isn't a need to distinguish option parse failures from help
requests.

Fixes: https://github.com/Zygo/bees/pull/277
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:15:37 -05:00
Zygo Blaxell
231593bfbc throttle: don't hold the multilock during throttle
Release the lock before entering the throttle sleep, so that other
threads can still run.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:15:37 -05:00
Zygo Blaxell
d4900cc5d5 docs: default throttle is zero
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:15:37 -05:00
Zygo Blaxell
81bbf7e1d4 throttle: set default to 0.0
Longer latency testing runs are not showing a consistent gain from a
throttle factor of 1.0.  Make the default more conservative.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:15:37 -05:00
Zygo Blaxell
bd9dc0229b docs: add --throttle-factor option
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:15:37 -05:00
Zygo Blaxell
2a1ed0b455 throttle: track time values more closely
Decaying averages by 10% every 5 minutes gives roughly a half-hour
half-life to the rolling average.  Speed that up to once per minute.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:14:31 -05:00
Zygo Blaxell
d160edc15a throttle: add --throttle-factor option to control throttling factor
Also change the initializer syntax for the option list to use C99
compound literals.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-03 23:13:51 -05:00
Zygo Blaxell
e79b242ce2 options: clean up the parser, prepare for new options with no short form
We're not adding any more short options, but the debugging code doesn't
work with optvals above 255.  Also clean up constness and variable
lifetimes.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-16 23:32:18 -05:00
Zygo Blaxell
ea45982293 throttle: add delays to match deferred request rate to btrfs completion rate
Measure the time spent running various operations that extend btrfs
transaction completion times (`LOGICAL_INO`, tmpfiles, and dedupe)
and arrange for each operation to run for not less than the average
amount of time by adding a sleep after each operation that takes less
than the average.

The delay after each operation is intended to slow down the rate of
deferred and long-running requests from bees to match the rate at which
btrfs is actually completing them.  This may help avoid big spikes in
latency if btrfs has so many requests queued that it has to force a
commit to release memory.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-16 23:32:18 -05:00
Zygo Blaxell
f209cafcd8 bees: bump the file limits again, 512k files and 64k dirs
Test machines keep blowing past the 32k file limit.  16 worker
threads at 10,000 files each is much larger than 32k.

Other high-FD-count services like DNS servers ask for million-file
rlimits.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-16 22:54:12 -05:00
Zygo Blaxell
c4b31bdd5c extent scan: no need for "No ref for extent" debug message
While a snapshot is being deleted, there will be a continuous stream of
"No ref for extent" messages.  This is a common event that does not need
to be reported.

There is an analogous situation when a call to open() fails with ENOENT.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-14 15:02:39 -05:00
Zygo Blaxell
08fe145988 context: wait for btrfs send to finish, then try dedupe again
Dedupe is not possible on a subvol where a btrfs send is running:

    BTRFS warning (device dm-22): cannot deduplicate to root 259417 while send operations are using it (1 in progress)

btrfs informs a process with EAGAIN that a dedupe could not be performed
due to a running send operation.

It would be possible to save the crawler state at the affected point,
fork a new crawler that avoids the subvol under send, and resume the
crawler state after a successful dedupe is detected; however, this only
helps the intersection of the set of users who have unrelated subvols
that don't share extents, and the set of users who cannot simply delay
dedupe until send is finished.  The simplest approach is to simply stop
and wait until the send goes away.

The simplest approach is taken here.  When a dedupe fails with EAGAIN,
affected Tasks will poll, approximately once per transaction, until the
dedupe succeeds or fails with a different error.

bees dedupe performance corresponds with the availability of subvols that
can accept dedupe requests.  While the dedupe is paused, no new Tasks can
be performed by the worker thread.  If subvols are small and isolated
from the bulk of the filesystem data, the result will be a small but
partial loss of dedupe performance during the send as some worker threads
get stuck on the sending subvol.  If subvols heavily share extents with
duplicate data in other subvols, worker threads will all become blocked,
and the entire bees process will pause until at least some of the running
sends terminate.

During the polling for btrfs send, the dedupe Task will hold its dst
file open.  This open FD won't interfere with snapshot or file delete
because send subvols are always read-only (it is not possible to delete
a file on a RO subvol, open or otherwise) and send itself holds the
affected subvol open, preventing its deletion.  Once the send terminates,
the dedupe will terminate soon after, and the normal FD release can occur.

This pausing during btrfs send is unrelated to the
`--workaround-btrfs-send` option, although `--workaround-btrfs-send` will
cause the pausing to trigger less often.  It applies to all scan modes.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-14 14:51:28 -05:00
Zygo Blaxell
bb09b1ab0e roots: drop method transid_re
There are no callers of this method any more, and it exposes more
of BeesRoots than we really want things to have access to.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-13 23:19:43 -05:00
Zygo Blaxell
94d9945d04 roots: move the transid cache update into transid_max_nocache()
All callers of the `transid_max_nocache` method update `m_transid_re`
with the return value, so do that in `transid_max_nocache` itself.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-13 23:19:43 -05:00
Zygo Blaxell
a02588b16f time: add more methods to support dynamic rate throttling
* Allow RateLimiter to change rate after construction.
 * Check range of rate argument in constructor.
 * Atomic increment for RateEstimator.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
21cedfb13e bytevector: rename the argument to operator[] to be more descriptive
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
b9abcceacb progress: move the "finished" tag to a column where it won't obscure data
The "done" pointer and the "%done" fields are still useful because they
indicate _actual_ progress, not the work that has been _promised_.
So it is possible for a crawl to be "finished" (all extents queued)
but not "100.0000%" (some of those extents still active or in the queue).

"deferred" state isn't particularly useful, so drop it.

"finished" state implies no ETA, so that column is unused.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
31f3a8d67d progress: relabel the inaccurate ETA column
ETA is calculated using a sample obtained by snooping on bees's normal
crawling operations.

This sample is heavily biased and not representative of the entire
filesystem.  If the distribution of extent sizes in the filesystem is
not uniform, the ETA can be wildly wrong.

Collecting an accurate sample set would require extra IO and CPU time
which should be spent doing dedupes instead.

Explicitly label the ETA as inaccurate to avoid having too many users
report the same bug.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
9beb602b16 task: ignore paused status while calculating dynamic thread count
bees might be unpaused at any time, so make sure that the dynamic load
calculation is ready with a non-zero thread count.

This avoids a delay of up to 5 seconds when responding to SIGUSR2
when loadavg tracking is enabled.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
0580c10082 main: add support for pause (SIGUSR1) and resume (SIGUSR2)
These are simple on/off switches for the task queue.  They are lightweight
requests for bees to be paused temporarily, but allow bees to release
open files and save progress while paused.

These signals are an alternative to SIGSTOP and SIGCONT, or using the
cgroup freezer's FROZEN and THAWED states, which pause and resume the
bees process, but do not allow the bees process to release open files
or save progress.  Snapshot and file deletes can occur on the filesystem
while bees is paused by SIGUSR1 but not by SIGSTOP.

These signals are also an alternative to SIGTERM and restart, which
flush out the whole hash table and progress state on exit, and read
the whole table back into memory on restart.

This feature is experimental and may be replaced by a more general
configuration or runtime control mechanism in the future.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:01:19 -05:00
Zygo Blaxell
1cbc894e6f task: start up more worker threads when unpausing
When paused, TaskConsumer threads will eventually notice the paused
condition and exit; however, there's nothing to restart threads when
exiting the paused state.

When unpausing, and while the lock is already held, create TaskConsumer
threads as needed to reach the target thread count.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 22:53:00 -05:00
Zygo Blaxell
d74862f1fc fs: set the correct nr_items to 0 in the ENOENT search case
Commit 72c3bf8438830b65cae7bdaff126053e562280e5 ("fs: handle ENOENT
within lib") was meant to prevent exceptions when a subvol is deleted.

If the search ioctl fails, the kernel won't set nr_items in the
ioctl output, which means `nr_items` still has the input value.  When
ENOENT is detected, `this->nr_items` is set to 0, then later `*this =
ioctl_ptr->key` overwrites `this->nr_items` with the original requested
number of items.

This replaced the ENOENT exception with an exception triggered by
interpreting garbage in the memory buffer.  The number of exceptions
was reduced because the memory buffers are frequently reused, but upper
layers would then reject the data or ignore it because it didn't match
the key range.

Fix by setting `ioctl_ptr->key.nr_items`, which then overwrites
`this->nr_items`, so the loop that extracts items from the ioctl data
gets the right number of items (i.e. zero).

Fixes: 72c3bf8438830b65cae7bdaff126053e562280e5 ("fs: handle ENOENT within lib")
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 22:48:15 -05:00
Zygo Blaxell
e40339856f readahead: use the right parameter order when checking the range
In some cases the offset and size arguments were flipped when checking to
see if a range had already been read.  This would have been OK as long as
the same mistake had been made consistently, since `bees_readahead_check`
only does a cache lookup on the parameters, it doesn't try to use them to
read a file.  Alas, there was one case where the correct order was used,
albeit a relatively rare one.

Fix all the calls to use the correct order.

Also fix a comment:  the recent request cache is global to all threads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-04 11:17:44 -05:00
Zygo Blaxell
1dd96f20c6 fs: drop extra declaration of hexdump
hexdump was moved into a template in its own header years ago, but
the declaration of the implementation that used to be in fs.cc remains.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-04 11:17:44 -05:00
Zygo Blaxell
cd7a71aba3 hexdump: be a little more lock-friendly
hexdump processes a vector as a contiguous sequence of bytes, regardless
of V's value type, so hexdump should get a pointer and use uint8_t to
read the data.

Some vector types have a lock and some atomics in their operator[], so
let's avoid hammering those.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-03 23:39:33 -05:00
Zygo Blaxell
e99a505b3b bytevector: don't deadlock on operator<<
operator<< was a friend class that locked the ByteVector, then invoked
hexdump on the bytevector, which used ByteVector::operator[]...which
locked the ByteVector, resulting in a deadlock.

operator<< shouldn't be a friend class anyway.  Make hexdump use the
normal public access methods for ByteVector.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-03 23:39:33 -05:00
Zygo Blaxell
3e89fe34ed roots: avoid copying a BtrfsIoctlSearchKey
Although all the members of BtrfsExtentDataFetcher are theoretically
copiable, there's no need to actually make any such copy.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-03 16:54:14 -05:00
Zygo Blaxell
dc74766179 context: spell "progress" correctly
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-02 09:50:28 -05:00
Zygo Blaxell
3a33a5386b context: add a PROGRESS: header in $BEESSTATUS
Make it clearer where the progress information goes.

Also add placeholder text so the progress section isn't empty at startup,
when the progress hasn't been calculated yet.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 11:41:59 -05:00
Zygo Blaxell
69e9bdfb0f docs: post-5.7 toxic extent handling
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
v0.11-rc1
2024-12-01 00:17:52 -05:00
Zygo Blaxell
7a197e2f33 bees: post-kernel-5.7 toxic extent handling
Toxic extents are mostly gone in kernel 5.7 and later.  Increase the
timeout for toxic extent handling to reduce false positives, and remove
persistenly stored toxic hashes from the hash table.

Toxic hashes are still stored nonpersistently to help mitigate problems
due to any remaining kernel bugs.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:52 -05:00
Zygo Blaxell
43d38ca536 extent scan: don't serialize dedupe and LOGICAL_INO when using extent scan mode
The serialization doesn't seem to be necessary for the extent scan mode.
No infinite loops in the kernel have been observed in the past two years,
despite never having used MultiLock for the extent scanner.

Leave the serialization for now on the subvol scanners.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:52 -05:00
Zygo Blaxell
7b0ed6a411 docs: default scan mode is 4, "extent"
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
8d4d153d1d main: set default scan mode to mode 4 (EXTENT)
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
d5a6c30623 docs: old missing features are not missing any more
The extent scan mode has been implemented (partially, but close enough
to win benchmarks).

New features include several nuisance dedupe countermeasures.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
25f7ced27b docs: add scan mode 4, "extent"
Extent is a different kind of scan mode, so introduce the concept of
the two kinds of scan mode, and rearrange the description of scan modes
along the new boundaries.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
c1af219246 progress: squeeze the progress table into 80 columns or less
We don't need the subvol numbers since they're only interesting to
developers.

We don't need both max and min sizes, pick one and drop the other.

Replace "16E" with "max"--it is the same number of characters, but
doesn't require the user to know what 1<<64 is off the top of their head.

Shorten "remain" to "todo" because sometimes those extra two columns
matter.

Drop the seconds field in ETA timestamps.  Long scan arrival times are
years away, and short scan arrival times are only updated once every
5 minutes, so the extra precision isn't useful.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
9c183c2c22 progress: put the progress table in the stats and status files
Make the progress information more accessible, without having to
enable full debug log and fish it out of the stream with grep.

Also increase the progress log level to INFO.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
59f8a467c3 extent scan: fix crawl_map creation
There are two crawl_maps in extent scan's next_transid:  one gets
initialized, the other gets used.  This works OK as long as bees is
resuming an existing scan, because the two maps are identical; however,
but it fails if bees is starting without an existing set of crawl data,
and one of the two maps is empty or partially filled.

The failure is intermittent, as the crawl map is being populated at
the same time next_transid runs.  It will eventually be completed after
several transaction cycles, at which point bees runs normally.
It does add significant delays during startup for benchmarks.

There's only one crawl_map in extent scan, it always has the same
crawlers, and extent scan's `next_transid` creates it by itself.
Ignore the map from BeesRoots/BeesCrawl.

Also throw in some missing but helpful trace statements.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
9987aa8583 progress: estimate actual data sizes for progress report
Replace pointers in the "done" and "total" columns with estimated data
sizes for each size tier.  The estimation is based on statistics
collected from extents scanned during the current bees run.

Move the total size for the entire filesystem up to the heading.

Report the _completed_ position (i.e. the one that would be saved in
`beescrawl.dat`), not the _queued_ position (i.e. the one where the
next Task would be created in memory).

At the end of the data, the crawl pointer ends up at some random point
in the filesystem just after the newest extent, so the progress gets to
99.7% and then goes to some random value like 47% or 3%, not to 100%.
Report "deferred" in the "done" column when the crawler is waiting for
the next transid, and "finished" in the "%done" column when the crawler
has reached the end of the data.  Suppress the ETA when finished.  This
makes it clear that there's no further work to do for these crawlers.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
da32667e02 docs: add event counters for extent scan
Add a section for all the new extent scan event counters.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00
Zygo Blaxell
8080abac97 extent scan: refactor BeesScanMode so derived classes decide their own scan scheduling
BeesScanModeExtent uses six scan Tasks instead of one, which leads
to awkwardness like the do_scan method to tell crawl_roots how to do
what it shouldn't need to know how to do anyway.

Move the crawl_roots logic into the ::scan methods themselves.

This also deletes the very popular "crawl_more ran out of data" message.
Extent scan explicitly indicates when a scan is complete, so there's
no longer a need to fish this message out of the log.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-01 00:17:51 -05:00