GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2026-01-05 02:10:20 +01:00

Author	SHA1	Message	Date
Zygo Blaxell	b9abcceacb	progress: move the "finished" tag to a column where it won't obscure data The "done" pointer and the "%done" fields are still useful because they indicate _actual_ progress, not the work that has been _promised_. So it is possible for a crawl to be "finished" (all extents queued) but not "100.0000%" (some of those extents still active or in the queue). "deferred" state isn't particularly useful, so drop it. "finished" state implies no ETA, so that column is unused. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 23:10:15 -05:00
Zygo Blaxell	31f3a8d67d	progress: relabel the inaccurate ETA column ETA is calculated using a sample obtained by snooping on bees's normal crawling operations. This sample is heavily biased and not representative of the entire filesystem. If the distribution of extent sizes in the filesystem is not uniform, the ETA can be wildly wrong. Collecting an accurate sample set would require extra IO and CPU time which should be spent doing dedupes instead. Explicitly label the ETA as inaccurate to avoid having too many users report the same bug. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 23:10:15 -05:00
Zygo Blaxell	9beb602b16	task: ignore paused status while calculating dynamic thread count bees might be unpaused at any time, so make sure that the dynamic load calculation is ready with a non-zero thread count. This avoids a delay of up to 5 seconds when responding to SIGUSR2 when loadavg tracking is enabled. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 23:10:15 -05:00
Zygo Blaxell	0580c10082	main: add support for pause (SIGUSR1) and resume (SIGUSR2) These are simple on/off switches for the task queue. They are lightweight requests for bees to be paused temporarily, but allow bees to release open files and save progress while paused. These signals are an alternative to SIGSTOP and SIGCONT, or using the cgroup freezer's FROZEN and THAWED states, which pause and resume the bees process, but do not allow the bees process to release open files or save progress. Snapshot and file deletes can occur on the filesystem while bees is paused by SIGUSR1 but not by SIGSTOP. These signals are also an alternative to SIGTERM and restart, which flush out the whole hash table and progress state on exit, and read the whole table back into memory on restart. This feature is experimental and may be replaced by a more general configuration or runtime control mechanism in the future. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 23:01:19 -05:00
Zygo Blaxell	1cbc894e6f	task: start up more worker threads when unpausing When paused, TaskConsumer threads will eventually notice the paused condition and exit; however, there's nothing to restart threads when exiting the paused state. When unpausing, and while the lock is already held, create TaskConsumer threads as needed to reach the target thread count. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 22:53:00 -05:00
Zygo Blaxell	d74862f1fc	fs: set the correct nr_items to 0 in the ENOENT search case Commit `72c3bf8438` ("fs: handle ENOENT within lib") was meant to prevent exceptions when a subvol is deleted. If the search ioctl fails, the kernel won't set nr_items in the ioctl output, which means `nr_items` still has the input value. When ENOENT is detected, `this->nr_items` is set to 0, then later `*this = ioctl_ptr->key` overwrites `this->nr_items` with the original requested number of items. This replaced the ENOENT exception with an exception triggered by interpreting garbage in the memory buffer. The number of exceptions was reduced because the memory buffers are frequently reused, but upper layers would then reject the data or ignore it because it didn't match the key range. Fix by setting `ioctl_ptr->key.nr_items`, which then overwrites `this->nr_items`, so the loop that extracts items from the ioctl data gets the right number of items (i.e. zero). Fixes: `72c3bf8438` ("fs: handle ENOENT within lib") Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 22:48:15 -05:00
Zygo Blaxell	e40339856f	readahead: use the right parameter order when checking the range In some cases the offset and size arguments were flipped when checking to see if a range had already been read. This would have been OK as long as the same mistake had been made consistently, since `bees_readahead_check` only does a cache lookup on the parameters, it doesn't try to use them to read a file. Alas, there was one case where the correct order was used, albeit a relatively rare one. Fix all the calls to use the correct order. Also fix a comment: the recent request cache is global to all threads. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-04 11:17:44 -05:00
Zygo Blaxell	1dd96f20c6	fs: drop extra declaration of hexdump hexdump was moved into a template in its own header years ago, but the declaration of the implementation that used to be in fs.cc remains. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-04 11:17:44 -05:00
Zygo Blaxell	cd7a71aba3	hexdump: be a little more lock-friendly hexdump processes a vector as a contiguous sequence of bytes, regardless of V's value type, so hexdump should get a pointer and use uint8_t to read the data. Some vector types have a lock and some atomics in their operator[], so let's avoid hammering those. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-03 23:39:33 -05:00
Zygo Blaxell	e99a505b3b	bytevector: don't deadlock on operator<< operator<< was a friend class that locked the ByteVector, then invoked hexdump on the bytevector, which used ByteVector::operator[]...which locked the ByteVector, resulting in a deadlock. operator<< shouldn't be a friend class anyway. Make hexdump use the normal public access methods for ByteVector. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-03 23:39:33 -05:00
Zygo Blaxell	3e89fe34ed	roots: avoid copying a BtrfsIoctlSearchKey Although all the members of BtrfsExtentDataFetcher are theoretically copiable, there's no need to actually make any such copy. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-03 16:54:14 -05:00
Zygo Blaxell	dc74766179	context: spell "progress" correctly Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-02 09:50:28 -05:00
Zygo Blaxell	3a33a5386b	context: add a PROGRESS: header in $BEESSTATUS Make it clearer where the progress information goes. Also add placeholder text so the progress section isn't empty at startup, when the progress hasn't been calculated yet. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 11:41:59 -05:00
Zygo Blaxell	69e9bdfb0f	docs: post-5.7 toxic extent handling Signed-off-by: Zygo Blaxell <bees@furryterror.org> v0.11-rc1	2024-12-01 00:17:52 -05:00
Zygo Blaxell	7a197e2f33	bees: post-kernel-5.7 toxic extent handling Toxic extents are mostly gone in kernel 5.7 and later. Increase the timeout for toxic extent handling to reduce false positives, and remove persistenly stored toxic hashes from the hash table. Toxic hashes are still stored nonpersistently to help mitigate problems due to any remaining kernel bugs. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:52 -05:00
Zygo Blaxell	43d38ca536	extent scan: don't serialize dedupe and LOGICAL_INO when using extent scan mode The serialization doesn't seem to be necessary for the extent scan mode. No infinite loops in the kernel have been observed in the past two years, despite never having used MultiLock for the extent scanner. Leave the serialization for now on the subvol scanners. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:52 -05:00
Zygo Blaxell	7b0ed6a411	docs: default scan mode is 4, "extent" Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	8d4d153d1d	main: set default scan mode to mode 4 (EXTENT) Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	d5a6c30623	docs: old missing features are not missing any more The extent scan mode has been implemented (partially, but close enough to win benchmarks). New features include several nuisance dedupe countermeasures. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	25f7ced27b	docs: add scan mode 4, "extent" Extent is a different kind of scan mode, so introduce the concept of the two kinds of scan mode, and rearrange the description of scan modes along the new boundaries. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	c1af219246	progress: squeeze the progress table into 80 columns or less We don't need the subvol numbers since they're only interesting to developers. We don't need both max and min sizes, pick one and drop the other. Replace "16E" with "max"--it is the same number of characters, but doesn't require the user to know what 1<<64 is off the top of their head. Shorten "remain" to "todo" because sometimes those extra two columns matter. Drop the seconds field in ETA timestamps. Long scan arrival times are years away, and short scan arrival times are only updated once every 5 minutes, so the extra precision isn't useful. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	9c183c2c22	progress: put the progress table in the stats and status files Make the progress information more accessible, without having to enable full debug log and fish it out of the stream with grep. Also increase the progress log level to INFO. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	59f8a467c3	extent scan: fix crawl_map creation There are two crawl_maps in extent scan's next_transid: one gets initialized, the other gets used. This works OK as long as bees is resuming an existing scan, because the two maps are identical; however, but it fails if bees is starting without an existing set of crawl data, and one of the two maps is empty or partially filled. The failure is intermittent, as the crawl map is being populated at the same time next_transid runs. It will eventually be completed after several transaction cycles, at which point bees runs normally. It does add significant delays during startup for benchmarks. There's only one crawl_map in extent scan, it always has the same crawlers, and extent scan's `next_transid` creates it by itself. Ignore the map from BeesRoots/BeesCrawl. Also throw in some missing but helpful trace statements. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	9987aa8583	progress: estimate actual data sizes for progress report Replace pointers in the "done" and "total" columns with estimated data sizes for each size tier. The estimation is based on statistics collected from extents scanned during the current bees run. Move the total size for the entire filesystem up to the heading. Report the _completed_ position (i.e. the one that would be saved in `beescrawl.dat`), not the _queued_ position (i.e. the one where the next Task would be created in memory). At the end of the data, the crawl pointer ends up at some random point in the filesystem just after the newest extent, so the progress gets to 99.7% and then goes to some random value like 47% or 3%, not to 100%. Report "deferred" in the "done" column when the crawler is waiting for the next transid, and "finished" in the "%done" column when the crawler has reached the end of the data. Suppress the ETA when finished. This makes it clear that there's no further work to do for these crawlers. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	da32667e02	docs: add event counters for extent scan Add a section for all the new extent scan event counters. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	8080abac97	extent scan: refactor BeesScanMode so derived classes decide their own scan scheduling BeesScanModeExtent uses six scan Tasks instead of one, which leads to awkwardness like the do_scan method to tell crawl_roots how to do what it shouldn't need to know how to do anyway. Move the crawl_roots logic into the ::scan methods themselves. This also deletes the very popular "crawl_more ran out of data" message. Extent scan explicitly indicates when a scan is complete, so there's no longer a need to fish this message out of the log. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	1e139d0ccc	extent scan: put all the refs in a single Task, sort them, use idle task The sorting avoids problematic read orders, like extent refs in the same inode with descending offsets, that btrfs is not optimized for. Putting everything in one Task keeps the queue sizes small, and manages the lock contention much more calmly. We only want to be mapping extent refs if there's not enough extents already in the queue to keep worker threads busy, so use the `idle()` method instead of `run()`. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	6542917ffa	extent scan: introduce SCAN_MODE_EXTENT The EXTENT scan mode reads the extent tree, splits it into tiers by extent size, converts each tiers's extents into subvol/inode/offset refs, then runs the legacy bees dedupe engine on the refs. The extent scan mode can cheaply compute completion percentage and ETA, so do that every time a new transid is observed. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	b99d80b40f	task: add an idle queue Add a second level queue which is only serviced when the local and global queues are empty. At some point there might be a need to implement a full priority queue, but for now two classes are sufficient. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	099ad2ce7c	fs: add some performance metrics for TREE_SEARCH_V2 calls These give some visibility into how efficiently bees is using the TREE_SEARCH_V2 ioctl. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	a59a02174f	table: add a simple text table renderer This should help clean up some of the uglier status outputs. Supports: * multi-line table cells * character fills * sparse tables * insert, delete by row and column * vertical separators and not much else. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	e22653e2c6	docs: remove "matched_" prefix event counters We can no longer reliably determine the number of hash table matches, since we'll stop counting after the first one. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	44810d6df8	scan_one_extent: remove the unreadahead after benchmark results That unreadahead used to result in a 10% hit on benchmarks. Now it's closer to 75%. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	8f92b1dacc	BeesRangePair: drop the _really_ expensive toxic extent workaround We were doing a `LOGICAL_INO` ioctl on every _block_ of a matching extent, just to see how long it takes. It takes a while! This could be modified to do an ioctl with the `IGNORE_OFFSET` flag, once per new extent, but the kernel bug was fixed a long time ago, so we can start removing all the toxic extent code. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	0b974b5485	scan_one_extent: in skip/scan lines, log whether extent is compressed Useful for debugging the compressed-zero-block cases. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	ce0367dafe	scan_one_extent: reduce the number of LOGICAL_INO calls before finding a duplicate block range When we have multiple possible matches for a block, we proceed in three phases: 1. retrieve each match's extent refs and put them in a list, 2. iterate over the list converting viable block matches into range matches, 3. sort and flatten the list of range matches into a non-overlapping list of ranges that cover all duplicate blocks exactly once. The separation of phase 1 and 2 creates a performance issue when there are many block matches in phase 1, and all the range matches in phase 2 are the same length. Even though we might quickly find the longest possible matching range early in phase 2, we first extract all of the extent refs from every possible matching block in phase 1, even though most of those refs will never be used. Fix this by moving the extent ref retrieval in phase 1 into a single loop in phase 2, and stop looping over matching blocks as soon as any dedupe range is created. This avoids iterating over a large list of blocks with expensive `LOGICAL_INO` ioctls in an attempt to improve the match when there is no hope of improvement, e.g. when all match ranges are 4K and the content is extremely prevalent in the data. If we find a matched block that is part of a short matching range, we can replace it with a block that is part of a long matching range, because there is a good chance we will find a matching hash block in the long range by looking up hashes after the end of the short range. In that case, overlapping dedupe ranges covering both blocks in the target extent will be inserted into the dedupe list, and the longest matches will be selected at phase 3. This usually provides a similar result to that of the loop in phase 1, but _much_ more efficiently. Some operations are left in phase 1, but they are all using internal functions, not ioctls. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	54ed6e1cff	docs: event counter updates after fixing counter names and scan_one_extent improvements Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	24b08ef7b7	scan_one_extent: eliminate nuisance dedupes, drop caches after reading data A laundry list of problems fixed: * Track which physical blocks have been read recently without making any changes, and don't read them again. * Separate dedupe, split, and hole-punching operations into distinct planning and execution phases. * Keep the longest dedupe from overlapping dedupe matches, and flatten them into non-overlapping operations. * Don't scan extents that have blocks already in the hash table. We can't (yet) touch such an extent without making unreachable space. Let them go. * Give better information in the scan summary visualization: show dedupe range start and end points (<ddd>), matching blocks (=), copy blocks (+), zero blocks (0), inserted blocks (.), unresolved match blocks (M), should-have-been-inserted-but-for-some-reason-wasn't blocks (i), and there's-a-bug-we-didn't-do-this-one blocks (#). * Drop cached data from extents that have been inserted into the hash table without modification. * Rewrite the hole punching for uncompressed extents, which apparently hasn't worked properly since the beginning. Nuisance dedupe elimination: * Don't do more than 100 dedupe, copy, or hole-punch operations per extent ref. * Don't split an extent or punch a hole unless dedupe would save at least half of the extent ref's size. * Write a "skip:" summary showing the planned work when nuisance dedupe elimination decides to skip an extent. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	97eab9655c	types: add shrink_begin and shrink_end methods for BeesFileRange and BeesRangePair These allow trimming of overlapping dedupes. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	05bf1ebf76	counters: fix counter names for scan_eof, scan_no_fd, scanf_deferred_inode This code gets moved around from time to time and ends up with the wrong prefix. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	606ac01d56	multilock: allow turning it off Add a master switch to turn off the entire MultiLock infrastructure for testing, without having to remove and add all the individual entry points. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	72c3bf8438	fs: handle ENOENT within lib This prevents the storms of exceptions that occur when a subvol is deleted. We simply treat the entire tree as if it was empty. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	72958a5e47	btrfs-tree: accessors for TreeFetcher classes' type and tree values Sometimes we have a generic TreeFetcher and we need to know which tree it came from. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	f25b4c81ba	btrfs-tree: add root refs and extent flags fields Lazily filling in accessor methods for btrfs objects as needed by bees. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	a64603568b	task: fix try_lock argument description try_lock allows specification of a different Task to be run instead of the current Task when the lock is busy. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	33cde5de97	bees: increase file cache size limits With some extents having 9999 refs, we can use much larger caches for file descriptors. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	5414c7344f	docs: resolve_overflow limit is only 655050 when BTRFS_MAX_EXTENT_REF_COUNT is Use the current header value in the doc. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	8bac00433d	bees: reduce extent ref limit to 9999 Originally the limit was 2730 (64KiB worth of ref pointers). This limit was a little too low for some common workloads, so it was then raised by a factor of 256 to 699050, but there are a lot of problems with extent counts that large. Most of those problems are memory usage and speed problems, but some of them trigger subtle kernel MM issues. 699050 references is too many to be practical. Set the limit to 9999, only 3-4x larger than the original 2730, to give up on deduplication when each deduped ref reduces the amount of space by no more than 0.01%. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	088cbc951a	docs: event counter updates after readahead sanity improvements Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	e78e05e212	readahead: inject more sanity at the foundation of an insane architecture This solves a third bad problem with bees reads: 3. The architecture above the read operations will issue read requests for the same physical blocks over and over in a short period of time. Fixing that properly requires rewriting the upper-level code, but a simple small table of recent read requests can reduce the effect of the problem by orders of magnitude. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00

1 2 3 4 5 ...

774 Commits